A Machine Learning Approach for Affordance Detection of Tools in 3D Visual Data

Size: px

Start display at page:

Download "A Machine Learning Approach for Affordance Detection of Tools in 3D Visual Data"

Tobias Allen
6 years ago
Views:

1 A Machine Learning Approach for Affordance Detection of Tools in 3D Visual Data Sebastian Ciocodeica A dissertation submitted in partial fulfilment of the requirements for the degree of Bachelor of Science of the University of Aberdeen. Department of Computing Science 2016

2 Declaration I declare that this document and the accompanying code has been composed by myself, and describes my own work, unless otherwise acknowledged in the text. It has not been accepted in any previous application for a degree. All verbatim extracts have been distinguished by quotation marks, and all sources of information have been specifically acknowledged. Signed: Sebastian Ciocodeica Date: 2016

3 Abstract This report will attempt to evaluate the possibility of using online 3D models of tools to adequately train a regression function for evaluating affordances of tools. This way, we will evaluate how well a system will manage to decide the usability of a variety of scanned tools for the task of hammering a nail. This will be done by synthesizing a multitude of variations from tool models found online in order to generate an adequate amount of data that will then be used as training data for the regression function. Our work has a human-like aspect to it and real life applications, which are discussed fully in the first chapters. In this paper, the function is created using a trained neural network based on our generated models. We define three functions, each using a different set of training data. Comparisons are made between each function in tests involving either the same category or a mixed category of tools. At the end we reflect upon what was achieved in the project and what direction should be taken for future work.

4 Acknowledgements I would like to thank, firstly, my supervisor Dr. Frank Guerin for the tremendous supervision and feedback throughout the term as well as for giving me the opportunity to take part in this exciting research. Secondly,Paulo, for the huge amount of help given, all the plastic knives and the patience required for my lack of geometry knowledge.

5 Contents 1 Introduction Overview Motivation Real World Connection Primary goals Secondary goals Background and Related Work Computer Vision Image Processing Pointclouds Superquadrics Superquadric fitting Object segmentation Affordance Moment of Inertia Composite Moment of Inertia Tool Rotation Gaussian Mixture Models Artificial Neural Networks Requirements Project Functional Requirements Performance Requirements System Design Overview Acquiring models Segmentation Initial Model Generation design and assumptions Gaussian Mixture Model Monotonicity and Action Segment assumptions Current Model Generation design Model input and Superquadric Fitting

6 CONTENTS Action part alteration Output Simulation preparation Folder structure generation Model positioning/rotation and X-Offset Mass and Inertia Simulation Mode and errors Neural Network Implementation Segmentation Model generation Initial design issues Model input Action part alteration Simulation preparation Position Model Inertia Mass X-Offset Gazebo Folder Structure Gazebo Simulation Mode calculation Neural Network Methodology and Tools Methodology Research Software development Backups Development tools Hardware Software Evaluation and discussion Description of procedure Mixed category predictions Simulator Results Comparison Related Work Results Comparison Human Rating Comparison Direct Observation Comparison Inter category predictions

7 CONTENTS Discussion Overall Performance Conclusion and future work Conclusion Future work A User Manual 61 A.1 Usage Instruction A.1.1 Matlab A.1.2 Tool Generation A.1.3 Meshlab A.1.4 CPC Segmentor B Maintenance Manual 65 B.1 Software Required B.2 Matlab Toolboxes Required B.3 Installation B.4 Source Files B.5 Future Work

8 Chapter 1 Introduction We start with giving a detailed presentation of the overview, motivation and goals for this project. Afterwards, we discuss the connection this project has with the real world in terms of its possible usage and in terms of its inspiration behind the design. Figure 1.1: Deciding on a substitute tool for different tasks based on 3D data of surrounding tools and training data of canonical tool. 1.1 Overview Humans are capable of accurately deducing the possible usage of an object just by the way it looks. We can thus infer how the object can interact with the environment and how well it can act as a tool in different tasks. This is a term dubbed as perceived affordance (Gibson, 1977). Humans see affordance of objects by their knowledge and experience of their usage. Multiple attempts have been made to try and computationally emulate human judgement of affordance. Methods were used such as using high level knowledge to categorise objects, inferring their usability or by defining a function for each task, calculating the individual score of each object based on low level knowledge of that object. This problem involves multiple challenges including robotics vision, knowledge of objects and 3D data manipulation. Our project will attempt to create a system which will learn the affordance of tools for different tasks using low level knowledge of objects. Our main approach involves avoiding having to hand-code a function specifying how important

9 1.2. MOTIVATION 9 each aspect of a tools is, as this would require human knowledge. We want to make the system create its own function, based on training data with as little human input as possible. In this chapter, we will make a case for why this work is important and worth evaluating. In the following chapters, the background information related to the project will be more thoroughly presented. Afterwards, we will present the requirements and design of the solution, contrasting the initial design with the final design and providing explanations for their differences. Details on the methodology, development tools and implementation are found in the chapters following. Finally, we look over the results of our evaluation and the discussion of the results, attempting to explain the underlying reasons for these results. 1.2 Motivation There is a growing interest in Machine Learning systems performing well in open and unconstrained environments, where not all elements of this environment are predefined and hardcoded. One way that this can be achieved is through the learning of affordance. Affordance is the known relation between the object and the effect it can have on the environment. Humans use affordance of objects often for improvisations of tool usage. This is based on the experience gained from using other tools to similar effects. This results in an approximate knowledge of how the ideal tool for that task should be composed and how adequate other tools are, based on their shape. For example: a frying pan with a large enough base and enough mass could substitute in hammering a nail quite well, while a knife, having a small narrow end, would not work as well. A chopstick with a narrow enough tip could easily pierce a potato, a knife is thin and narrow enough to flip a pancake, and so forth. Humans often have an intuitive idea of how a tool has to look for it to be good or bad for a task. This could be due to internal mental simulations of using an observed or imagined tool. For example: we can easily look at an object and make up a scenario where we would be using that tool to hammer a nail resulting in a good idea of how useful it would be. We can also imagine what would happen if that observed tool was wider, sharper or rounder or it is large enough to hit a nail or small enough to fit into a gap. We are interested to know if this can be abstracted using Machine Learning and if the results are relatable to human judgement Real World Connection The most direct use for our project can be found in robotics, especially in relation to service robots. We ideally want a robot capable of operating outside of a closed environment. Even if it were to have inbuild information of canonical tools, such as hammer or spatula, these might not be available in the immediate area. A service robot capable of substituting alternative unknown tools is extremely useful in many tasks in everyday home environments. This problem is important considering how unfeasible it is to have a robot with pre-defined data of all possible tools and their variations. Taking inspiration from human mental behaviour, we attempted to come up with a solution for this problem. A representative scenario of the approach outlined in the paper could be: The model of the ideal tool needed for a task is coded into the robot s system. Alternatively, the robot is able to observe the ideal tool being used for a task and extract an abstract model and its usage.

10 1.3. PRIMARY GOALS 10 The robot then imagines variations of the observed tool. Creating altered models based on measurements such as its size, shape, angle with the handle, sharpness etc. The robot learns the importance of each parameter by internally simulating the modified tools in the previously observed task and analysing the results. This all happens while the robot is offline, when the robot has time, in order to save processing power for when it is needed. In a situation where an ideal tool is not available, the robot will be able to scan alternative tools and quickly score how useful that substitute tool would be, based on previous simulation experience. It would then be able to fulfil the task using previously unobserved tools in the open environment it was placed in. The problem goes beyond simply seeing a knife as a knife or a hammer as a hammer. Our system is not based on determining usage by categorisation but by overall characteristics of objects. 1.3 Primary goals We will define primary goals as the achievable objectives with highest priority to the project.these are as follows: Acquire models of canonical tools used for tested tasks. Extract low level details of the tools. Correctly generate multiple modified models from some original models. Simulate previously created models resulting in training data. Create a successful regression function able to adequately predict usability of substitute tool. Evaluate the resulting predictions, both for the same and other types of tools, and compare with related work results and human judgement. 1.4 Secondary goals These goals were seen as optional and some were attempted within the allowed project time frame. Refine machine learning to take into consideration mass and inertia of tools. Refine machine learning to include task specific parameters. For example: nail height, gap size, etc. Evaluate scalability of tool generation function across multiple types of tools.

11 Chapter 2 Background and Related Work In this chapter we present the background information used and referred to in the project. It encompasses the results of the research conducted in terms of concepts and related work. While some of the material was not used for the final solution, it still holds importance due to the contrast presented with the final solution, giving reasoning for its lack of usage within Chapter 4 System Design. 2.1 Computer Vision Computer vision is the field involved in developing methods of acquiring and processing images, often from real world environments, resulting in numerical information. Within this field, a big emphasis is put on attempting to replicate human capabilities in terms of perceiving and understanding images. This includes identification and segmentation of objects Image Processing Although our project does not deal with 2D image processing, but with 3D, a small introduction is given. An image is defined as a function of two real variables, one representing an amplitude(for example brightness) and the other representing the coordinates in space(x,y). An image may sometimes contain sub-images known as regions of interest or ROIs. In this way we can model the fact that images often contain collections of objects, each representing an ROI. A 2D continuous image a(x,y) is divided into rows and columns with the intersection of a row and column being called a pixel. In most cases, an image is a function of many variables, including depth(z), colour(c), and time(t). This formalism being the building block of image processing algorithms: a(x, y, z, c, t)(young et al., 1998). 3D image capturing is often done by sensors capable of capturing RGB images (containing colour data) along with per-pixer depth information (D). One such RGB-D sensor is the Kinect scanner, which has made an impact on the market due to its affordability. A notorious problem of 3D image processing involves image acquirement of the environment and converting this data in such a way that object recognition from such images is possible. This is a major issue in areas of robotics where we want robots to be able to manipulate objects. Identification of real world images is not the only issue. There is a need for systems to be able to process models which were created computationally. Computer-aided design (CAD) files, used in computer systems for the aid of creation or modification of a design, are an example of one such type of file that we want systems to be able to recognise and manipulate. There is no standard algorithm for acquirement, conversion and interpretation of CAD and general 3D models. This is

2.1. COMPUTER VISION 12 especially the case within robotics where it is difficult to have good scans of objects in multiple types of environments. 2.1.2 Pointclouds One such way of modelling 3D models is through the means of pointclouds.

12 2.1. COMPUTER VISION 12 especially the case within robotics where it is difficult to have good scans of objects in multiple types of environments Pointclouds One such way of modelling 3D models is through the means of pointclouds. A pointcloud is a set of data points within a coordinate system, usually defined on a X, Y and Z axis. They are often used with the intention of representing the external surface of an object. 3D Scanners such as the Artec or Kinect are able to return a point cloud file from measuring a large number of points on an object s surface. Some scanners are capable of returning the normals of point or surfaces, which represent the orientation in space of each point. This technique is used for many purposes, including to create 3D CAD models for parts manufacturing. Typically, pointclouds are not directly used in 3D applications and are often converted to triangle mesh models through a process known as surface reconstruction. A triangle mesh is composed of a set of triangles, created by connecting points through vertices. Some surface reconstruction techniques, such as the Dalaunay Triangulation, thus involve the creation of triangles from the points in the pointcloud, resulting in mesh faces, as seen in figure 2.1 (Young et al., 1998). Some scanning hardware such as the Kinect or Artec scanners are able to give a complete mesh from scanning objects. Figure 2.1: Pointcloud being converted to a triangle mesh. Source: (Young et al., 1998) Pointclouds as a means of representing 3D models has become more wide spread usage in recent years, especially across robotics. There currently exists a whole comprehensive, free library of algorithms for manipulating pointclouds and 3D data called PCL (Rusu and Cousins, 2011). Problem is, representing an object in only coordinate points is not enough for many tasks. Some attempt have been made to fix this limitation by the use of geometrical shapes being modelled as substitutes of the pointcloud visual data.

2.1. COMPUTER VISION 13 2.1.3 Superquadrics For our project, we need a way of processing models of objects acquired from a range sensor.

13 2.1. COMPUTER VISION Superquadrics For our project, we need a way of processing models of objects acquired from a range sensor. For this, we will need a technique for processing low level data of scanned models, the pointclouds, into 3D shapes and surfaces, such as cones, spheres, cylinders etc. One such technique involves the usage of superquadric shapes, seen in Figure 2.2. Figure 2.2: Various superquadric shapes. Source: (Jaklic et al., 2013) Superquadrics are a flexible family of 3D parametric objects. The computational models of superquadrics can represent a variety of 3D geometric shapes: (cones, spheres,boxes ect) with very simple parametrisation (Jaklic et al., 2013). They were first introduced into computer graphics by Barr (1981) with his work on the formulation of angle-preserving transformations, but only came into the attention of the computer vision community due to the work of Pentland (1986). We are interested in the usage of superquadrics in relation with point cloud models of objects. Pioneering work done on recovery of superquadratics with global deformations in a single-view point-cloud was conducted by Solina and Bajcsy (1990). They demonstrated that the recovery process is sensitive to noise, making stable object recognition by the use of superquadrics difficult. The last decade has seen great progress in this area by the means of interpretation of the distorted scene. For example Dickinson et al. (1997) present a method for segmenting and estimating 3D shapes of objects from scanned object data. They start by segmenting the image into regions corresponding to their volumetric parts that make up the object. Basic shapes are inferred from this process and afterwards superquadrics are fitted onto the object. Further early successes can be seen in works such as Leonardis et al. (1997) where it was shown that superquadrics can be directly recovered from unsegmented data without any presegmentation steps. This method was a vast improvement in terms of speed and accuracy in comparison to Dickinson et al. (1997) due to the lack of pre-segmentation, giving rise to the recover and select paradigm. Improvements of speed on this method by usage of random samples can be found in works such as Tao et al. (2004).

14 2.2. OBJECT SEGMENTATION 14 The problem with approaches such as the ones mentioned above are that these are in direct opposition with purposeful object detection as known object do not require bottom up segmentation. For this reason, in some cases, the above methods are not computationally effective. Biegelbauer et al. (2008) is an example of a more modern successful usage of superquadrics for object recognition, attempting to solve this computational issue. Their system was able to fit superquadric models onto point clouds of everyday objects in a one side view of a cluttered scene acquired through a laser range sensor. The approximated geometric model description of the object is acquired with the usage of superquadrics and then a probabilistic/voting system is used, resulting in a good system for object recognition. This work represents a vast computational advantage to previous methods Superquadric fitting The superquadric recovery technique used in our project is that used in (Abelha et al., 2015). Their technique uses a number of seed points on the pointcloud model. Each seed acts as the initial point for the fitting. The fitting consists of changing the parameters in the superquadrics to create different shapes in an attempt to approximate the shape of the pointcloud. The best fit is seen as the closest approximation of the superquadrics to the model shape. The method used is taken from and described fully in Jaklic et al. (2013). An in-depth explanation of how this occurs is the following: There exists a function able to check if a point with coordinates (x,y,z) is either inside, on the surface or outside a superquadric using the formula seen below. F takes values depending on the position of the point relative to the superquadric (<1 - inside =1 - on surface >1 - outside): (( ) 2 ) (( ) 2 ) ε 2 ε x 2 ε y 1 F(x,y,z) = a 1 + a 2 ε1 + ( ) 2 ε z 1 a 3 This function has 5 parameters a 1,a 2,a 3,ε 1 and ε 2 that can be changed to get variance in scale for each axis (respectively a 1,a 2 and a 3 ) and overall shape (ε 1 and ε 2 ). The model can allow parameters for general position in space (p x, p y and p z ) as well as orientation in ZY Z Euler angles (φ,θ and ψ). F(x w,y w,z w ) = F(x w,y w,z w ;a 1,a 2,a 3,ε 1,ε 2,φ,θ,ψ, p x, p y, p z ) Having a set of 3D points of an object, the superquadric can be recovered by attempting different fits, with the best solution being obtained from the minimization problem: min Λ n i=1 (F(x i,y i,z i ;Λ)) 2 This technique has proved to result in successful superquadric interpretation of point clouds within their study(abelha et al., 2015). We make use of this technique within our project. 2.2 Object segmentation Segmentation of 3D objects into smaller, cohesive parts is a notoriously hard task in computer vision and remains an open area of research. The problem stems from the fact that segmentation requires a good deal of human knowledge and common knowledge of objects, which is extremely hard to model within a computer system.

15 2.2. OBJECT SEGMENTATION 15 One type of segmentation is the segmentation of objects from a space such as a table or ground. One modern solution for this can be found in (Richtsfeld et al., 2014) where a framework is presented for segmentation of unknown objects from RGB-D images. It functions by a step of data abstraction and object segmentation resulting in good results of objects within cluttered scenes. One major limitation of this work is that in situations where objects are stacked side by side, pre-segmentation will incorrectly one patch of the image resulting in the following grouping being incorrect. The type of segmentation that we will use is the segmentation of a tool in separate distinguishable parts. Several attempts have been made to solve this issue. Kalogerakis et al. (2010) created a system of 3D Segmentation and Labeling based on training data and object classification. While achieving good results based on the Pricenton Segmentation Benchmark (Chen et al., 2009), the final product is not a generic segmentation method as it is dependent on class-specific labels and training data, with issues such as connected parts of an object sharing a label(kalogerakis et al., 2010). Another study by Xu et al. (2014) conceived a segmentation algorithm able to use transferable knowledge from a database of previous segmentations, without any need for item categorisation. Although categorisation is not needed, its performance is still based on previous experience, which is ultimately dependent on the quality of the initial segmentation. An initial segmentation of a dog which does not see the neck as a separate part will result in future segmentations also not viewing the neck separately. The methods described above fail to achieve human-like performance without the use of ground-truth training examples, as such, an unsupervised bottom up approach is preferable for our project. For our project, the segmentation of tools into various parts was done using the Constrained Planar Cuts algorithm (Schoeler et al., 2015). An image of a tool segmented with CPC can be seen in Figure 2.3. The algorithm is part of PCL, an open source library of algorithms for processing point clouds 1. CPC is a bottom up unsupervised methods used for the segmentation of 3D point clouds into separate parts. The algorithm works without the need of top-down object knowledge by detecting local concavities, creating cuts throughout the tool resulting in partitions. Although other methods of the same type exist, such as the Randomized Cuts (Golovinskiy and Funkhouser, 2008) or Random Walks Segmentation(Lai et al., 2008), the CPC algorithm performs significantly better in comparison due to its design. The CPC algorithm design is based on concavities. Local concavities in tools offer a powerful indicator of a segment border. Uses an adjacency table for spaces named voxels (Papon et al., 2013) and the LCCP-algorithm (Christoph Stein et al., 2014) for determining if an edge is either convex or concave. To further demonstrate the capability of the algorithm, CPC proves to be highly efficient across multiple types of objects and performs significantly better than other existing bottom-up methods and similarly efficient as top down methods, based on the Princeton Segmentation Benchmark (Chen et al., 2009). Success of this approach can also be seen in works such as (Abelha et al., 2015) in which multiple household items were properly segmented, resulting in a grasp and action segment. The 1 (Rusu and Cousins, 2011)

16 2.3. AFFORDANCE 16 Figure 2.3: An example of CPC segmenting a hammer model into three distinguishable segments. basic idea of convex and concavity being a proper basis for segmentation can be seen in other works as well (Christoph Stein et al., 2014). Schoeler s segmentation algorithm is thus a state of the art method for part segmentation, which is the main reason for its usage within this project. 2.3 Affordance Affordance is a term coined by psychologist James J. Gibson. In his article from 1977, he defines affordance as an agent s perceived possible actions between themselves and the environment (Gibson, 1977). For example, a door knob affords a twisting motion or a knife affords a stabbing motion. The psychological concept has been adapted and used within computer vision. Earlier works attempted to use a function-based approach on 3D CAD models of objects to define them to a certain category, deciding its usage/affordance (Stark and Bowyer, 1994). Other work focused on recognition of parts of objects with certain affordance. Saxena et al. (2008) describe a learning algorithm able to recognise the grasping part of an object based on two or more images of an object. The main strength of this work is that only partial triangulation is required around the decided grasp area and not the entire model. In another study, Myers et al. (2015) evaluate two bottom up approaches to learning affordances based solely on local shape and geometry recognition. Their results returned precise predictions of the tool s identify from which affordance was assumed, albeit at a considerable processing cost. This main weak point for this and Saxena s work is the lack of relationship data between parts. For example, in a spatula, the relationship between the angle of the head and handle is important. If the angle would have been different, the tool might be hard to use. But for primitive parts affordance these methods function very well. In a study by Ugur and Piater (2015), a bottom up approach to developing symbolic planning operators is tested. This was achieved by the self-exploratory actions of a robot in an environment using its action repertoire. The learned affordance of items was then tested in a stacking, tower building task, giving good results. The biggest fallback of this study is that only one action,namely

17 2.4. MOMENT OF INERTIA 17 stacking, was tested in its planning generation and execution. Both studies above focus on a learning method using visual features and lack higher level knowledge related to the relationship between object parts. They will thus struggle with detecting affordance of a tool with the relationship between segments at a different angle from canonical tools. Affordance equivalence will be lost simply due to tools being significantly different in appearance. The most relevant past work related to this project is the work done by Abelha et al. (2015). The system used handcoded models of canonical tools for certain tasks, for example: hammer for hammering nails, chopstick for piercing potato. These models were coded in terms of superquadrics and the relationship between the shapes. An attempt is then done to fit these high level models onto pointclouds of scanned tools which are inputted into the system. The output is then a numerical value of how good of a fit the pointclouds are, thus abstracting how good of a candidate substitute tool it is for certain tasks. The handcoded functions were a mix between linear and quadratic functions and were designed by using human judgement in thinking how each property of the parts of a tool contribute more or less to a certain affordance. An ideal model would be projected onto an object, for example, a chopstick. Then the handle of the ideal tool would be mapped to a part of the chopstick and the action segment of the ideal would be mapped to another part. Then the functions would be able to assess the overall affordance of the chopstick by scoring the aspects of each mapped parts. These aspects could contribute both positively and negatively. We use parts of the design used in this paper for our approach. Results from this study will be contrasted with the results of the project of this paper in the evaluation chapter. Our approach uses machine learning instead of using hardcoded functions as this can be seen as the main issue with the paper, considering the amount of human judgement needed to create these functions. 2.4 Moment of Inertia Calculating the moment of inertia was needed for the simulation of each individual tool. The moment of inertia, or rotation inertia, is the measurements of the object s resistance to rotation about an axis. An example to visualise the difference is the difference of difficulty between swinging a hammer by its handle in comparison to swinging by the head. The moment of inertia for a point mass is given by: I = r 2 M where r is the distance from the axis of rotation and M is the mass of the particle. When not a single point, we break the object into slices of mass, calculating individual inertia di by: di = r 2 dm Composite Moment of Inertia For this project, there was a need to calculate the inertia for the entirety of a tool from its segmented parts. We assumed homogeneous density of the models. The material for some 3D models were assumed, for example stainless steel for hammer. The approach to calculate the composite moment of inertia was as follows: 1. Divide tool into segments (already done by the segmentation algorithm) 2. Fit superquadrics onto the segments, resulting in simple shapes from which we can easily calculate the individual moments of inertia.

18 2.5. TOOL ROTATION Calculate centroid of each part. 4. Find centroid Y of overall object. 5. Calculate moment of inertia of each part, I(i) based on own centroidal axis. 6. Calculate the orthogonal distance of each centroid of each part to overall figure centroid, d(i) = Y y(i) 7. Find transfer term for each term: A(i)d(i) 2 8. We finally find overall moment of inertia with formula: I(total) = [I(i) + A(i)d(i) 2 ] 2.5 Tool Rotation Within our project, we made use of a robotics simulation software. To correctly use our tool models within the simulated world, we needed to position the models accordingly. Tool positioning was implemented using the knowledge of Rotation Matrices. A rotation matrix is a concept from linear algebra which is used within Euclidean Spaces for point rotation. In the project solution, the 3D formalism was used. This gives the coordinate rotation in the X,YandZ axis that a point has to make for the 3D pointcloud to move in clockwise or counter-clock wise manner. This is possible since any rotation can be given as a composition of rotations about three axes, a result of Euler s Rotation Theorem. A rotation can thus be represented by a 3 x 3 matrix operating on a vector in the form seen in Figure 2.4. Figure 2.4: General matrix rotation notation. This project uses basic rotation matrices for counter clockwise rotations in X,Y and Z axis, as seen in Figure 2.5. Figure 2.5: Rotation Matrices for counterclockwise rotations in X Y and Z. 2.6 Gaussian Mixture Models A mixture model is defined as a probabilistic model used within representations of subgroups of overall data groups. This is done without a separate set of observed data used for identification of each individual data sub-group. A GMM model assumes that all data points are generated from

19 2.7. ARTIFICIAL NEURAL NETWORKS 19 a combination of a finite number of Gaussian distributions across a number of dimensions. The probability density over the mixture results by the addition of the probability of each weighted gaussian distribution. An example of GMM being used in computer vision can be seen in Zivkovic (2004). They use an adaptive algorithm using GMM probability density distribution for the purpose of background subtraction of an image, with good results. While our project did not use an adaptive algorithm, we considered the idea of populating a probabilistic distribution and sampling from the resulting mixture distribution. The idea proved to be ineffective due to the issue arising from fitting too little instances of high dimensional data. These issues are addressed in works such as Krishnamurthy (2011). 2.7 Artificial Neural Networks Artificial Neural Networks are a family of computational models inspired by the biology of the neurological neuron first computationally defined in McCulloch and Pitts (1943). They are made up of highly interconnected processing elements, called nodes, and process information based their state response to external inputs. It is composed of multiple layers of nodes: an input node, hidden layer(s) and output layer. A neural network is trained by the use of training data, resulting in different degrees of importance for each connection. Thus, a neural network is able to model the importance of each parameter input. The data used for training is typically divided into training, validation and testing data. Neural Networks are used as a Machine Learning technique which can be used for tasks such as regression and classification problems. Considering the hand modelled tasks shown in Section 2.3, we expect our affordance problem to be able to model quadratic capabilities. Neural Networks are able, with sufficient layers, to models the possible functions we might need. Since we did not explore the internal functionalities of a Neural Network, we use it in a black box manner.

20 Chapter 3 Requirements The requirements described here will act as only one criteria for the progress of the project. As this project involves a level of research and exploration, the requirements presented here cannot act as the only criteria for completion. Due to this, it also had many changes throughout the development process, with the list below being the final established version. 3.1 Project Functional Requirements We define here the functional requirements of all individual elements as well as the overall need of the project. The overall requirement of the project is defined as: To evaluate the possibility of using online models and their synthesized variations for a regression function serving as a task model for evaluating affordances of tools. Parts of requirements are: To acquire good models of canonical tools used for our tested tasks. To rescale tool models if necessary. To segment models correctly with the CPC algorithm, clearly defining an action and a grasping part. To adequately fit superquadric shapes onto the model pointcloud, resulting in individual shape and relationship parameters. To be able to get overall model parameters. To correctly be able to modify each parameter resulting in a new model. To generate a number of modified models by iteratively changing each parameter of the action part and its relationship with the grasping part. Each iteration should result in a new model being written. To calculate all needed values for writing the simulation files describing the models. This includes: mass, centre of mass, composite inertia of the tools. To prepare the simulation preliminaries for Gazebo. This includes creating the right folder and file structure. To set tool model in the correct position and direction for simulation.

21 3.2. PERFORMANCE REQUIREMENTS 21 To correctly set up simulation software such that the models appear correctly, possible errors are accounted for and the result is in a reasonable format. To simulate each modified tool a number of times, getting an overall numerical indication of its usefulness. This should be done in an automatic way. To format results from simulation such that they are usable as training data. To use model parameters and simulation results as training data for the creation of a regression function within a neural network. To evaluate the accuracy of the function in comparison with related work and human judgement. Different formats of the input data could be tested and compared. To evaluate how well the solution manages with different categories of tools and comparisons. 3.2 Performance Requirements Here we define requirements in terms of their relevance to performance. Although they are not critical as the functional requirements, a good amount of effort will be put to adhere to them. Software is scalable to a large number of possible 3D models outside of the canonical model for tasks. A huge amount of processing power should not be needed. No crashes or slow down should occur under high usage. This is most likely during superquadratic fitting. Generation and simulation of tools should take a reasonable amount of time. Training of the regression function should take a reasonable amount of time.

22 Chapter 4 System Design This chapter presents the various topics related to the design of the pieces of software involved in the project. Explanation and justification for these choices is given and discussed. Parts of the original design had to be changed to accommodate the various limitations that were discovered during implementation. These changes are discussed and motivated throughout the sub-sections. The more technical aspects of how each section currently functions can be found in Chapter 5, Implementation. 4.1 Overview Given the nature of the problem, the design was split into a number of steps for a better conceptualization of the solution. Each part came with its unique challenges and some had to be approached from a different angle. A big emphasis will be put on contrasting the initial design and ideas with what was finally selected as the ideal solution. Due to the exploratory nature of this project, problems were encountered often, leading to frequent changes in the design. To avoid clutter, only the most relevant problems were written of in this chapter. The major parts of the solution are: Acquiring models Tool Segmentation Generating models Preparation for simulation Simulation Learning The segmentation and fitting of superquadric shapes is explained in depth in Chapter 2, Background and Related Work. Upon fitting the superquadric shapes on a tool model, each shape results in a vector of parameters representing each superquadric s shape and relationship with other shapes. Out of these, the most important shapes are the grasping segment, the part which would normally be grasped by a human, and the action segment, the part which is responsible for the action of the task. These parameters are of great importance, as they will be the numerical reasoning behind our generation and training. Out of these individual parameters, we compose a vector of ten parameters for the entire model. Most of these parameters are taken from the action part, which

4.1. OVERVIEW 23 Figure 4.1: A hammer model being fitted by superquadric shapes. has to be clearly segmented. The reasoning behind its importance is discussed when explaining model generation.

23 4.1. OVERVIEW 23 Figure 4.1: A hammer model being fitted by superquadric shapes. has to be clearly segmented. The reasoning behind its importance is discussed when explaining model generation. The ten parameters of the model vector are as follows: Three values for scale: X Y and Z representing the space occupied in the Euclidean Space. These represent the X Y and Z of the action segment. Two values for shape: Shape 1 and Shape 2, representing how squared or rounded the shape is. Difference between Shape 1 and 2 is the positioning of the shape change. These are also taken from the action segment. Two tapering values: Tapering 1 and Tapering 2. They represent a level of distortion in the model resulting in sharpness. The difference between the two is the direction in the segment of the sharpness. Also taken from action segment. Three relationship values between action and grasp segment: angle z, angle centres and distance centres. Angle Z is the angle of the segment with the Z axis while angle centres represents the angles between the vectors of the grasp and action segment centres. The distance centres parameters represents the distance between the two segment centres. The overall in-depth plan is then this: We download models from online sources for each type of canonical tool: hammer for hammering, spatula for lifting pancakes, etc.. We convert all models to pointclouds and segment them, such that a grasping and an action segment is clearly separated. We choose one model tool as the ideal looking tool for a task. For example: one of the downloaded hammers for hammering nail. We then do a superquadric fitting on the model, resulting in a vector of parameters for each segment. Out of these parameters we compose an entire vector for the model. A generation function is done such that it results in many modified models of the original tool by changing its model vector parameters. For example: having a larger, smaller or wider hammer head or the angles between the handle and hammer head is different. Each model is

24 4.2. ACQUIRING MODELS 24 then simulated, resulting in a value representing how good it performed on the task. The resulting values are then used as training data for a neural network, creating a regression function. Finally, we evaluate this function with an number of other parameters. Figure 4.2: Steps of the process of affordance learning. 4.2 Acquiring models The first major problem was that of acquiring enough models for an adequate training of the neural network. We initially had access to twenty one already segmented point cloud models. This would not be near enough the number of models that we would need to correctly train a neural network so more models were required. The initial idea of just scanning enough different models using the Kinect 2 and Artec scanners, which were used to acquire the initial tools, very quickly proved to be inadequate. The amount of time needed to properly scan tools was too high and the number of real world tools needed to be modelled came into question as well. For this reason, it was decided that a better option was to download point cloud models from different sources online. Most sources relied on uploads from their online community. Some sources that were found included: 3DWarehouse 1 Thingiverse 2 3DCadBrowser 3 Yobi3D 4 TF3DM

25 4.3. SEGMENTATION 25 It was thought that bulks of models could be downloaded based on some keyword search. Afterwards, the models which were downloaded, but were not the literal tool, had to be removed. For example, hammer car or tank models instead of the hammer tool. First search for tools found that there were not as many free models available as initially thought. Also, some tools were easier to find than others: hammers were significantly more frequently uploaded than chopsticks or ladles. For example: 3D Warehouse query of hammer results in 1203 results compared to chopstick having 34 and ladle having 31. An important note is that a significant part of the hammer query results are not the literal tool but models of other equipment named hammer. For this reason, even a bulk download and then filtration of models would not have been adequate. While this might have a higher chance of relative success for hammers, it is significantly more difficult for the spatula or ladle, which have very few results, out of which most are not the liberal tool. For this reason, the decision was taken to use generated data. Initially, this approach was seen as less desirable or less powerful for evaluation, since we would be creating data specifically for this project instead of using open environment data. Ultimately, this was seen as an interesting direction for our solution. This decision was taken also due to related work on affordance having good results despite also working with generated data (Schoeler and Worgotter, 2016). The generated data would have to have a basis, which would be role of the downloaded models. Tools were downloaded from 3DWarehouse, due to the higher number of tool models available, making the task of finding good models easier. For example a query of hammer returns 1203 results on 3DWarehouse compared to 702 on Thingiverse or 19 on 3DCADBrowser. We decided to focus on the hammering task first, due to the relative ease of finding models. This was also the only task we managed to test thoroughly within the project s time frame. Eleven hammers were downloaded for a model basis. Some more unconventional hammers were downloaded to be tested, such as a warhammer, seen in Figure 4.3. Figure 4.3: An unconventional hammer model: war hammer, aka Hammer Segmentation Out next step was to segment our acquired models, such that each hammer has a clear action and grasp segment, even if other smaller segments are present. The segmentation software used is the CPC algorithm, explained in Section 5.1. Our main objective is to correctly segment the part of the tool which comes into direct contact during the simulated task. This is because the identified action segment will be the segment being altered during generation and should be clearly defined to avoid unrealistic results during simulation. A number of conversions were first needed to be

4.3. SEGMENTATION 26 done before the segmentation software could read the files. Some models were part of a collection of tools or a scene and needed to be cropped, as seen in Figure 4.

26 4.3. SEGMENTATION 26 done before the segmentation software could read the files. Some models were part of a collection of tools or a scene and needed to be cropped, as seen in Figure 4.4. Figure 4.4: A hammer as part of a scene. This further shows that an automatic gathering of tools can be difficult to implement and it would require the system to have high level understanding of which part of the scene it should crop, which defeats the purpose of our project. Another issue we had was the variety of scales that the tools were created in. Our models varied from a hammer being 3 centimetres to some being 20 meters long. The segmentation algorithm uses three dimensional spaces of a certain fixed size. As such, abnormally small or big models would give strange results. Our models were changed such that all were in a range between 25 and 40 centimetres. Our first attempt at segmentation found that most of the models did not have enough points to be properly segmented, an example being seen in Figure 4.5. The segmentation algorithm could not find the local concavities needed for it to result in more segments. One reason why these models were not created to have many points was likely the lack of this being a necessity. Having more points than necessary makes the model harder to process. They were likely initially created with just enough points for them to have defined, visible faces. Figure 4.5: A hammer pointcloud with very few point. To solve this issue, a few filters from Meshlab were considered to add points to the models. This required testing of different methods as some filters worked better than others and some did not function well at all, causing holes in the model and making them lose their shapes. After finding the right filter, all our models were read correctly by the software as even those with low number of points became a high quality point cloud, as seen in Figure 4.6.

This was a process of trial and error involving the parameters of the software being iteratively increased or decreased until we had a clearly defined action and grasping part.

27 4.4. INITIAL MODEL GENERATION DESIGN AND ASSUMPTIONS 27 Figure 4.6: Hammer model after being applied a filter, adding a large number of points. We had to get acquainted with the segmentation software functionality. This was a process of trial and error involving the parameters of the software being iteratively increased or decreased until we had a clearly defined action and grasping part. Segmentation was done on all 11 hammers with good results. This emphasises that the CPC algorithm was indeed the right choice of software for the project. Even the more unconventional models were segmented in an appropriate way, with the segment used to hit the nail being separated, as seen in Figure 4.7. Figure 4.7: Segmented warhammer resulting in 5 segments. At this point, we are able to load any of the segmented pointclouds and test the superquadric fitting algorithm, resulting in a good shape fit for all models. 4.4 Initial Model Generation design and assumptions Initially, we intended to have the generation and simulation to be one element. Our reasoning was to have a loop where we generate new models with changed parameters, simulate it getting its usability score and then assess if our score was getting worse in the direction of that parameter. If a few models gave bad results in a row, the loop would stop, stating that it found a boundary for that parameter. This idea was not feasible due to technical issues, discussed more in depth during the implementation chapter, resulting in each element being independent Gaussian Mixture Model The first idea related to how we would generate new models was to implement a Gaussian Mixture Model. Each model would fill in one point in the ten dimensional space representing the model

28 4.4. INITIAL MODEL GENERATION DESIGN AND ASSUMPTIONS 28 vector parameters. All models of one type would encompass the space and the generated data would be drawn/sampled from that space. The main problem with this approach was, again, the lack of initial models. When dealing with a set of about ten parameters, resulting in a ten dimensional space, the small number of instances we could potentially fill the space would not be enough for an adequate sampling. The numbers we would sample from the space would not be representative enough for hammer parameters with a small initial pool, as observed and mentioned in other works (Krishnamurthy, 2011). Another problem involved the overall dimensional space. Even if we would sample hammers from that space, our results would be in a high number. With 10 iterations, we are looking at a 10 to the power of 10 dimensional space worth of data, assuming the 10 model vector parameters, which would make our neural network take a significant amount of time to train. Sampling from anywhere within the 10 parameters will also result in tools which would not be tools in the realworld. For this reason, changing all ten of our parameters in a composite way would not only result in huge amounts of data, but also in unfeasible tools. There was also the possibility that sampling from the Gaussians of normal hammers would never give us the parameters of other tools such as a frying pan, which would still be a good tool for hammering Monotonicity and Action Segment assumptions We then refined our next idea, which involved using our fitted models and altering one parameter at a time. The argument is that by altering one parameter independently, our system can learn the individual contributions of each tool. We decided to alter each parameter in an increasing and decreasing order. We also have a monotonic assumption to the alterations. When a tool starts becoming worse, we assume it will only become worse in that direction, increasing or decreasing. For example, a hammer with a small head will make hammering difficult since it will slip or miss the nail. Making the hammer head smaller will only make its usability worse. A downside to this assumption is that we might be falling into a local minima during the alteration of some parameters, although we could not think of any situation where this could be the case. As to not rely purely on human judgement, it was decided to generate some data further than the point where results started getting lower. In related work on affordance (Abelha et al., 2015), each parameter s importance was scored independently in the hardcoded task functions. The study resulted in good results. As such, there is no reason why our study would have lesser results due to our assumption on the independence of parameters. An important aspect of how we designed our model parameter vector is our assumption of the independence of the action segment. Our model vector contains seven parameters taken from the segment identified as the action superquadric. We did not take into consideration values from other segments as we went under the assumption that the action part holds the key aspect of any tool s usability. While there might be tools with different types of handles, for the most part, humans are capable of grasping anything. Handles of tools in real life also tend to be fairly stable,being made for the human grip. Our system at this point should not be concerned with the various shapes a handle might have, as for the most part, this does not affect how usable tools are. Generation will thus only focus on the action segment and the relationship between it and the grasp segment. Future work could focus on automating the decision of whether a tool is grabable or not.

29 4.5. CURRENT MODEL GENERATION DESIGN Current Model Generation design Here we will present the model generation process, as it is designed now. The work related to preparing input and folder structure for the simulation software is discussed in the following section. Parts of the implementation used pre existing software from related work (Abelha et al., 2015). We will mention where this happens during the following sections. Figure 4.8: Flowchart of the model generation process Model input and Superquadric Fitting The process starts by taking as input the segmented pointcloud of the model tool as well as a number of parameters, including a grasp and action point which are used to determine the respective segments. Giving no grasp and action input results in the user being asked to click on where these points exist, as seen in Figure 4.9. This is a provisory arrangement to the simulated idea of a service robot getting all the parameters through vision. That is, a robot would be shown a demonstration of how each tool is used, taking note of where the tool is normally grasped, what part makes contact with the target and at what direction is the action performed. We then fit superquadric shapes, as previously explained in Section The model vector is then extracted using an already existing function which takes the parameters of each superquadric, returning the relevant overall model information. The process then extract the model mesh information(pointclouds, normals and faces) for all superquadric shapes, except the action segment. Model mesh information is calculated by pre-existing software. The model vector and mesh are

4.5. CURRENT MODEL GENERATION DESIGN 30 Figure 4.9: User Interface asking for the user to click on a grasp and action point. stored as the original tool data.

30 4.5. CURRENT MODEL GENERATION DESIGN 30 Figure 4.9: User Interface asking for the user to click on a grasp and action point. stored as the original tool data. Finally, the tool is aligned such that it is put in the position required by the simulator Action part alteration At this point, the loop for tool generation starts. For each batch of new models we manually change the number of models to be created and the numerical change added/subtracted to the parameter for each iteration. On each loop, a copy is made of the original model vector and mesh data. The chosen parameter from the vector is then modified. The action segment is altered accordingly, depending on the parameter changed. An example can be seen in Figure The modified action segment is then added to the model by first changing the segment into a pointcloud using existing software. The number of points in the pointcloud are then downsampled for easier processing of the generation and simulator. This is due to the fact that our initial generations were slow. Good results in the simulator were possible and faster with a lower number of points. We then use the Delaunay Triangulation Method to create mesh faces for our segment and add the action segment to the model. The altered model of this loop iteration is now ready to be saved as a file. Figure 4.10: Modified hammer with changed shape parameter. The superquadric shape which was fit to the action segment, which is the hammer head, can be seen being enlarged. During some initial generation runs, we noticed that decreasing some parameters did not seem to modify the tools. This was due to the fact that although the action part was being modified and concatenated to the model, its initial location still had the action part there. It was copied and not removed from the initial space, as such, any alteration where the shape would get smaller would

31 4.6. SIMULATION PREPARATION 31 not be noticeable in that area. The bigger resulting shapes where noticeable in the space outside of the original position Output Once we have a simulation ready mesh model, we prepare the output for the simulator and neural network. We first calculate the X-Offset value, which is the distance needed between the hammer and the nail to make it possible for the hammer to correctly hit. Afterwards, the moment of inertia of the tool as well as its mass and centre mass are calculated. Finally, the new modified model is outputted to a file using pre-existing software. The model vector for this altered tool is then concatenated to a text file. The last part needed is creating the folder structure, with the files needed for the simulator to work correctly. Each iteration will result in a separate folder with a model ready to be simulated. 4.6 Simulation preparation Most of the modifications to the output were done while getting familiar with the simulation software. Thus, we found that mass as well as inertia was relevant to the simulation and needed to be calculated for each model. In this section, we look at how each is designed to be calculated Folder structure generation The simulation software used required a strict folder structure for it to function. Each task simulated had its own folder, which contained subfolders of each entity in the simulated world. In case of hammering, this is the nail, box and all the generated hammers. Each task folder contains a world file which states which of the models will be present in the simulation. For each model, we create a folder with the name of the tool and the number of the current loop, for identification. Inside each tool folder, we have the model file, which is converted and copied in the folder after it is created, as well as the configuration file and the SDF file. Our system writes the config file, which is standard for all altered models. The SDF file contains information about the tool. This is written in the same way as the configuration file, adding the previously calculated X-Offset, centre mass, mass and inertia within their required spaces. The SDF and config files are written in XML format. Initially, we thought of using software which would manipulate XML elements but then realised that writing the XML as strings is easier. A figure of the folder structure can be seen in Figure Figure 4.11: The needed folder structure for simulation.

4.6. SIMULATION PREPARATION 32 4.6.2 Model positioning/rotation and X-Offset We start positioning our model by first getting the rotation matrix between the grasp vector and the Z axis and then

32 4.6. SIMULATION PREPARATION Model positioning/rotation and X-Offset We start positioning our model by first getting the rotation matrix between the grasp vector and the Z axis and then rotating the pointcloud, aligning it with the Z axis. The grasp and action centres positions are changed to follow the previous rotation. We then check to see if the tool is upside down in the Z axis by checking if the centre of the action shape is below the grasp shape centre. If it is, we rotate the model along the Y axis by 180 degrees, using the appropriate rotation matrix. The segments, as well as the vectors are rotated and translated accordingly. We afterwards need to find the action vector of our tool, as the superquadric fitting might have positioned a shape facing in the opposite direction of what we need. We find the action vector by defining it as going from the centre of action to the projected point of the action centre onto the Z axis. We then calculate the needed rotation such that the action vector will be the same as the demonstration vector, which is a parameter to our generation function. The action vector is then aligned accordingly and the rest of the model elements are translated after rotation. This is done because we want the identical alignment of the vector of action as with the action that needs to be performed. We move the tool base centre to the origin in the X and Y axis. Finally, we move the tool up along the Z axis so that the base (the minimum point in Z) is on the origin. We finally update the positioning data of all superquadrics. Types of tools such as the chopstick and screwdriver resulted in the function taking the action segment, and pointing it to the side making it look similar to how a hammer would look, as seen in Figure It is almost like the system decided to reposition the shapes, thinking it would function better in that positioning. This is likely due to the function pointing the segment towards the demonstration vector. This would only affect the generation of models, as such, it did not affect our evaluation. This issue fixed by skipping the vector alignment in cases where this distortion would happen. Figure 4.12: Screwdriver model being distorted during action part positioning. The X offset is the distance needed between the hammer and the nail such that the hammer falls correctly on the nail during simulation. We do this by first projecting our centre of action shape. We do this by first getting the X and Y values of the grasp centre and the Z point of the action, creating a point. We then add the height of the box to the Z value as the hammer is sitting

33 4.7. SIMULATION 33 on top of a top during simulation, this creates our final projection of the action centre. We create a vector using the largest point in the X axis, 0 as the Y value and the final projected centre as the Z value, creating a vector. The largest numerical value of X is taken as that is the direction the demonstration vector is set towards. We normalise this, creating the distance from the bottom of our box to the top of the target(dt). The distance to the target(tool Distance) is found by using the Pythagorean formulae for triangles, as seen in Figure We square root the distance to the top of the target(disttop) minus the target height(target Height), which is a parameter to the generation function, resulting in the tool distance. Finally, our pose offset in X is thus the target distance, which is a parameter of the generation function, minus the tool distance previously found. Figure 4.13: Figure showing the calculation of the tool distance as the result of the Pythagorean formula with the distance to the top of the nail and height of target Mass and Inertia The centre of mass of the tool is found as the centroid of the whole pointcloud using a Matlab function(kmeans). The mass of the tool is found by multiplying the density of the tool with the total volume of all superquadrics. Individual volumes are found using existing software. The density value is deduced by the material of the tool, given as a parameter in the tool generation function. We calculate the composite moment of inertia of the model using the method found during research in section As the method stated is the exact design. We refer to that section for further explanations. 4.7 Simulation We initial considered manually labelling the usability of tools using human judgement but that felt superficial. This is where the task simulation made by Krasimir Georgiev using Gazebo was used. The task simulation, the models used in the simulation world (nail, boxes), as well as the logic were implemented by Krasimir Georgiev. Some small changes to the original code were made for the purposes of our project. The simulation spawns the hammer and a nail, both on top of a box. The nail can be pushed within the box on impact. When the simulation starts, the hammer is moved towards the nail,

4.7. SIMULATION 34 making it hit the nail and then setting it back to an upright position. When the nail is hit, the distance of the nail being pushed into the box is outputted.

34 4.7. SIMULATION 34 making it hit the nail and then setting it back to an upright position. When the nail is hit, the distance of the nail being pushed into the box is outputted. A screenshot of the simulator can be seen in Figure Figure 4.14: The simulator running a hammering simulator with our chosen model. Before and after nail collision. We needed a way to automatically simulate each hammer which was previously generated. An initial idea involved trying to make Gazebo add each model, simulate, remove it and replace it with the next one. This did not work as removing tools is not a trivial task in Gazebo, so simulating all the tools could not happen in one instance of Gazebo. Instead, a bash script was written so that each hammer would be simulated automatically one after another. The world file, which states the path for which models should be in the simulation, is set such that the hammer model always points towards a hammer_training folder. Our script iterates in a sorted manner through each folder, changing the folder name from their original name: hammer_training_(number) to hammer_training, the filename pointed at by the world file. The simulation was changed such that it closes the application after collision with the nail. The script calls Gazebo to start unpaused, resulting in the hammer hitting the nail before the GUI is fully initiated, making the process faster. Distance of the nail is then written by concatenation into an output.txt file. The folder s name is then changed back to its original name, starting a new loop. The code of the Gazebo simulation was changed such that the simulation closes after 2 seconds of no impact with the nail, thus avoiding the loop being stuck, in cases where a hammer ends up not hitting the nail Mode and errors The Gazebo simulator is prone to a number of errors which might cause the simulation not starting at all. There might also be situations where one tool would behave weirdly, resulting in a lucky strike resulting in a big nail distance. For this reason, we decided to simulate each hammer ten times and create a function that would get the mode of the trial results of each set of hammer simulations. With this function, even if the hammer model has a simulation where it managed an exaggerated score, it will not affect its overall score. We initially considered calculating the mean of the set of simulations but an abnormal high score would result in unrealistic mean scores. We initially had the script run each hammer five times but this still resulted in a high number of unrealistic scores. We thus changed the number to ten. Initially, error texts following the simulation not working were outputted alongside the results. The text error was then changed to a 0 within the mode calculation function. This did not

35 4.8. NEURAL NETWORK 35 work as there were situations where the error would happen but the simulation would still work and also output a distance, resulting an incorrect number of results, making the mode calculation impossible, as the function checks that the number of results in the output file is a multiple of the number of trials for each model. We solved this by outputting a begin trial and end trial text at the beginning and end of each set of tool simulations. The function counts the number of outputs between the texts and if it is less than the intended number, scores of zeros are added to make up for that count. This results in more realistic results. There were cases where the distance appeared as a negative number. The function was changed such that all negative numbers were approximated to 0. After these modification, the function returned mode scores which were representative of the overall behaviour of each tool. Mode results are returned which we then manually copied to a output.txt file to be used as training data. 4.8 Neural Network For our learning, we have decided to make use of an Artificial Neural Network. For one reason, this is due to the nature of the problem for which we assume to have to model a complex non-linear systems and not a closed form equation. Our Neural Network will take the form of a multilayered perceptron, trained with backpropagation of error. This type of network was chosen due to its ease of creation and understandability of design. We define our problem as being a regression/function approximation using supervised learning. The possibility of defining the problem as that of categorisation was discussed. Thus, tools would have been labelled as: very good, good with effort, slightly effective, no good. It was decided that numerical output and then categorising the predictions into categories would be better. This would make our evaluation easier considering the different categories available. Numerical output would not work as well for tasks such as Pancake lifting. This is something to consider for future work. Our network is designed as a two layer feed-forward network with sigmoid hidden neurons and linear output neurons. Network uses mean squared error for calculating error levels. We use the Levenberg Marquardt backpropagation algorithm for our training. This algorithm typically requires more memory but less time. Training automatically stops when generalization stops improving, as indicated by an increase in the mean square error of the validation samples. This can be seen in Figure We map between the input file, representing the model vectors from the generation and the output file having the results from the simulation attempts. We set 70% of our samples for training, 15% for validation and 15% for testing. We decide on 10 as the number of hidden neurons as a good number revolves around the number of input and output nodes. We train the neural networks until the training functions decides that the mean squared error has reached a plateau. Three different function are created, as explained in Evaluation, Section

36 4.8. NEURAL NETWORK 36 Figure 4.15: Plot showing the mean squared error of the neural network during each time chunk(epoch).

37 Chapter 5 Implementation More technical information of how parts of the solutions function will be presented in this chapter. This chapter omits going into details of how the borrowed software is designed. Algorithms are given for each noted function. 5.1 Segmentation Models downloaded from 3DWarehouse were initially in the Sketchup.skp format and had to be converted to the Wavefront.obj geometry definition file format. This was done using SketchUp Make. Model of tools that were part of a collection of tools or a scene were cropped also using SketchUp. To solve the issue regarding the number of points in models, the following Meshlab filters were tested: Filters/Sampling/Dust Accumulation: worked for some tools but some ended up with missing parts of the model. This is likely due to the direction that the dust particles were applied and multiple uses could not be done as all model faces were lost after one application of Dust Accumulation, as seen in Figure 5.1. Figure 5.1: Hammer pointcloud with missing parts of the shape on one side. Filters/Remeshing Simplification and Reconstruction/ Marching Cubes. connect the points of some segments. Did not always

38 5.1. SEGMENTATION 38 Filters/Sampling/Stratified Triangle Sampling. Worked best for all hammer models. We used points on a random connection basis. After applying the filter, the.obj files were saved as.ply. Ply files were loaded then into Matlab using the borrowed ReadPointCloud function which returns a structure P. The structure contains: v - the vertices, n - the normals, u - the point coordinates in the pointcloud, segms - the segments, each being a structure, and f- the faces. Each segment is a structure of coordinates of its points, which are parts of the overall collection of point coordinates. The tools were rescaled using the rescaling algorithm ToolScalling. This was needed due to the VCCS voxel size of the segmentation, as seen in Section 5.1. The measurements are shown in meters. Algorithm 1 Tool rescaling 1: procedure RESCALE HAMMER(pointcloud) 2: scaled_pcl pointcloud 3: max_dim max(range(pointcloud)) 4: loop(1) while(max_dim<0.25): 5: scaled_pcl scaled_pcd*10 6: max_dim max(range(scaled_pcl)) 7: goto loop. 8: loop(2) while (max_dim>0.4): 9: scaled_pcl scaled_pcd/10 10: max_dim max(range(scaled_pcl)) 11: goto loop. Models were then outputted in the.pcd format, which is a supported format for the segmentation software. This is done using the borrowed WritePcd function. Segmentation was done using the CPC segmentation algorithm from the PCL library, seen in section 5.1. The parameters for segmentation are Cut 1, Cut 2, Seeds, Voxels size, Smoothing. We had access to the parameters used that worked best for the tools used in Abelha et al. (2015), as seen in Figure 5.2. Figure 5.2: The segmentation commands used for the initial tools. To segment our tools, we tested and changed the parameters accordingly until we found the command needed to properly segment each, seen in Figure 5.3. These parameters seem to work well for other tools within the same category and scale. An example segmentation terminal command and the result can be seen in Figure 5.4.

5.2. MODEL GENERATION 39 Figure 5.3: The segmentation commands used for the acquired hammers. Figure 5.4: Example of CPC command and resulting segmented model. 5.2 Model generation 5.2.1 Initial design issues The initial idea of having the generation and the simulation run together was not feasible.

All instances where Matlab calls terminal commands worked well except the Gazebo initialization.

39 5.2. MODEL GENERATION 39 Figure 5.3: The segmentation commands used for the acquired hammers. Figure 5.4: Example of CPC command and resulting segmented model. 5.2 Model generation Initial design issues The initial idea of having the generation and the simulation run together was not feasible. We intended to have Matlab call the Gazebo simulation through a function calling terminal commands but this did not work correctly. All instances where Matlab calls terminal commands worked well except the Gazebo initialization. Matlab did not recognise the Gazebo commands when calling the function, likely due to the dynamic library for paths it creates for each session. One attempt at solving this involved manually changing the library session paths of executables to nothing, which would force Matlab to use the system s PATH library. The changed variables were: OSG_LD_LIBRARY_PATH, LD_LIBRARY_PATH and XFILESEARCHPATH. This managed to run Gazebo,but it did not find the needed nail and hammer models, thus the simulation did not function correctly. This was strange as running the exact same command through a normal terminal resulted in normal functionality. One possible explanation is that this might be due to the fact that the started Gazebo sessions using the system function do not have the defined model paths that a directly started Gazebo session would.

40 5.2. MODEL GENERATION Model input The function takes the path of the model, model name along with the following parameters: grasp and action point(optional), demo_action_vec, targetheight, targetdist,boxheight,material. The parameters should be dependent on the task simulation but our current function is made only for the hammer function. Demo_action_vec is the vector of the action of the simulation. Target height is the length of the nail. Target distance is the distance from the simulated model to the nail. The box height is the height of the box the nail is set on. Material is what the model is theoretically made out of. Parameters for our generations are: grasp=[ ] action=[ ] hammering vector: [1 0 0], targetheight: meters, target_dist:0.325 meters, boxheight:0.05 material:stainless steel. The model original data (pointcloud, mesh info and superquadrics) is received by the Get- ModelMesh function. Empty grasp and action input parameters result in asking the user to click on these points by using Matlab scatter3 and view[0 90] functions. The X and Y position of each point is selected by clicking. The Z is written by the user when showing a view[0 0] of the model which shows the Z axis. FreeFitting returning superquadric data structures (SQs) and ExtractModel returning model structure (model) are both borrowed software. Mesh model info(pointcloud,normal,faces) of SQs without the action SQ are extracted using SQsToPCL borrowed software. Each SQ(superquadric) structure is composed of 15 parameters: 3 for Scale, 2 for Shape, 3 for Euler angles, 2 for Tapering, 2 for Bending and 3 for central positioning and angles. The overall model vector is calculated using parts of SQ parameters, especially the action SQ parameters. It is composed of: action SQ parameters: 1-5 and 9-10 and well as model.anglez, model.angle_centers and model.dist_centers. Algorithm 2 Get Model Mesh Info 1: procedure GETMODELMESH(path,grasp/action point,action_vec) 2: pointcloud/segms ReadPointCloud(path) 3: if grasp/actionpoint = null then return GetGraspActionPoint(pointcloud) 4: SQs FreeFitting(segms) 5: model ExtractModel(SQs,grasp/action point) 6: loop SQs: 7: if SQ! = actionsq then return SQs_wout_action+=SQ 8: goto loop 9: MeshIn f o(point/normals/ f aces) SQsToPCL(SQs_wout_action,segms) 10: PositionModel(model, meshin f o) Action part alteration The numerical amount modified with each iteration is set by the variable modification. Input.txt is where model vector is written for each model. The action SQ is altered by ModifyActionPart. Action part is then turned into a pointcloud by the borrowed SQ2Pcl_Archimedes function which samples points on the model. DownsamplePCL is also used to increase processing speed. Delaunay is the Matlab function for Delaunay Triangulation, used to turn points into triangle meshes. Borrowed WritePly function is used to output the final model.the modifications on the action part, ModifyActionPart, happen depending on the changed parameters in the following ways:

41 5.3. SIMULATION PREPARATION 41 For changes in scale, shape and tapering (first 7 of the model vector parameters), the identical parameter in the action superquadric is changed with the new value from the model. Visual changes would then be seen after writing the file. For changes in the angle Z(parameter 8) and angle between centres(parameter 9), a rotation and translation in the Y axis is done using the appropriate Rotation Matrix, the new location being store in the action segment vector. The amount of rotation is calculated by the difference between the new and original model vector. The distance between centres(parameter 10) is changed by first calculating a distance proportion by dividing the new parameter with the old model.dist_centers parameter. The SQ parameters are then changed multiplying the distance proportion with the model vector centres. Algorithm 3 Tool generation 1: procedure PCDTOSIM(path,grasp/action_point,action_vec,targetHeight/Dist,boxHeight,material) 2: pointcloud/model/sqs_orig GetModelMesh(path,grasp/action_point,action_vec) 3: modi f ication 0 4: f ile file_path_input.txt 5: loop: 6: point/model/sqs point/model/sqs_orig 7: new_model_vector SQs.action/model.angles/model.dist 8: modi f ication modification : new_model_vector(param) new_model_vector(param)+modification 10: SQs.action ModifyActionPart(model,SQs.action,new_model_vector) 11: action_pcl SQ2Pcl(SQs.action) 12: action_pcl Downsample(SQs.action) 13: action_ f aces Delaunay(SQs.action) 14: model model+action_pcl 15: X_O f f set XOffsetCalc() 16: inertia/center_mass/mass/density CalcCompMomentInertia(SQs,material)) 17: action_pcl Downsample(SQs.action) 18: W riteply model 19: input_txt new_model_vector 20: CreateGazeboModelFolderStruct(loopIndex, X_O f f set, model,) 21: goto loop. 22: close; 5.3 Simulation preparation Position Model Positioning of the model operates by using Matlab toolbox function vrrotvec which calculates rotation between vectors, and vrrotvec2mat which translates the rotation to a matrix representation. Other functions are used,such as eul2rotm and rotm2eul to translate between Euler and rotation matrix representation. The pre-existing ApplyTransformations function is used for the translation in space of the SQs.

42 5.3. SIMULATION PREPARATION 42 Algorithm 4 Model Positioning 1: procedure POSITIONMODEL(pointcloud/model info,demo_action_vec) 2: r (rotationmatrix(model.grasp.vec,[001])) 3: rotate(r) 4: rotatepointcloud(r) 5: rotategrasp/actioncentres(r) 6: updatesqs 7: if model.action.sq[z] < model.grasp.sq[z] then 8: r (rotationmatrix(pi,y )) 9: rotate(r) 10: rotatepointcloud(r) 11: rotategrasp/actioncentres(r) 12: updatesqs 13: central_base_point_xy (model.graspsq(x)model.graspsq(y )0) 14: center_action_vec (model.actionsq(x,y, Z) central_base_point_xy 15: center_action_vec [center_action_vec(x,y )0] 16: model.action.vec center_action_vec/norm(center_action_vec) 17: r (rotationmatrix(model.action.vec, demo_action_vec)) 18: rotate(r) 19: rotatepointcloud(r) 20: rotategrasp/actioncentres(r) 21: updatesqs 22: center_base model.grasp.sq(x,y, Z) 23: model.grasp.sq/action.sq model.grasp.sq(x,y, Z) center_base 24: up_vec min(z) 25: model.grasp.sq(z)/action.sq(z) model.grasp.sq(z)/actionsq(z) up_vec 26: r rotationmatrix([001]model.action.vec) 27: rotate(r) 28: rotatepointcloud(r) 29: rotategrasp/actioncentres(r)

43 5.3. SIMULATION PREPARATION Inertia Existing code for returning inertia of individual genomes is called MomentInertiaSQ. Kmeans, the matlab function, is used to find the pointcloud centroid. Algorithm 5 Inertia 1: procedure CALCCOMPMOMENTINERTIA(SQ, pcl,material) 2: center_mass kmeans(pcl) 3: loop(1) i in SQs: 4: inertiapart(i) MomentInertiaSQ(i) 5: goto loop(1) 6: MassCalcSQs, material 7: loop(2) i in SQs: 8: IParts(i) Pro ject(sqs(i), axisre f erence) 9: goto loop(2) 10: MassCalcSQs, material 11: loop(3) i in IParts: 12: inertiasum inertiasum + IParts(i) 13: loop(3) i in IParts: 14: inertiasum inertiasum density Mass VolumeSQ is borrowed software to find volume of each SQ. The mass comes from having both volume and density, which we assume to be homogeneous. We assume the tool is made of one material. Algorithm 6 Mass 1: procedure MASSCALC(SQs,material) 2: tot_vol 0 3: loop i in SQs: 4: tot_vol tot_vol +VolumeSQ(SQ[i]) 5: goto loop 6: density density(material) 7: mass density tot_vol X-Offset Algorithm 7 X-Offset 1: procedure XOFFSETCALC(pointcloud/model info,boxheight,targetdist) 2: pro j_center_action model.graspsq(x,y )model.actionsq(z) 3: f inal_pro j_center_action pro j_center_action + boxheight 4: DistTop norm(max(x)0 f inal_pro j_center_action) 5: ToolDistance sqrt((disttop DistTop) (targetheight targetheight)) 6: Pose_XO f f set targetdist ToolDistance Gazebo Folder Structure We define the Gazebo Model Path as the folder where all the simulations will be executed. The full path variable is then the path concatenated with the name of the tool and the number of the

44 5.4. GAZEBO SIMULATION 44 iteration. It uses a call to the terminal which uses meshlabserver to convert the.pcd file to the collada.dae file which is needed by Gazebo. Meshlab server is a feature of meshlab such that commands can be used without the GUI. writemodelconfig and writesdffilehammer writes the files by writing XML tags as strings to text files. Algorithm 8 Create Gazebo Structure 1: procedure CREATEGAZEBOMODELFOLDERSTRUCTURE(loopIndex,mesh_number,X_Offset,model) 2: gazebomodelpath path 3: f ullpathname path/tool_name + loopindex 4: makedir f ullpathname 5: writemodelcon f ig( f ullpathname) 6: writesd f FileHammer( f ullpathname, X_O f f set, model(center_mass/mass/inertia), loopindex) 7: copy f ile(meshpath, f ullpathname) 8: terminalcommand(meshlabserver(.plyto.dae)) 5.4 Gazebo Simulation Some modification were made to the initial C++ code. The simulation closes by calling a terminal command to close gserver process after the hammer hits the nail. The simulation also continuously calculates simulation time and close process after 2 seconds of no collision. For the ubuntu script, special care had to be taken to ensure correct ordering and renaming as initial attempts resulted in a ordering of: 1, 10, 2. Since we want an ordered simulation of our tools, this was fixed to a ordering of 1,2,3 and so forth. Print functions in the pseudocode represent output to the console. printtofile represents concatenation to the output.txt file responsible for holding results of the simulation. Algorithm 9 Simulation Script 1: procedure GAZEBOSIMULATION 2: loop(1) P in sorted directories hammer* : 3: prevname FolderName(P) 4: print(p) 5: P changename(hammer_training) 6: printtofile(start_trial) 7: loop(2) i in 1 to 10: 8: print(i) 9: printtofile(gazebohammering.world) 10: goto loop(2) 11: printtofile(end_trial) 12: hammer t raining changename(prevname) 13: goto loop(1) Mode calculation Epsilon is the variable stating how close one real number needs to be another for them to be declared equal. start_trial was chosen at the text declaring a start of a set of trials while end_trial declares the end. Algorithm makes abstraction of epsilon verification for each set of trials and mode output.

45 5.4. GAZEBO SIMULATION 45 Algorithm 10 Calculate Result Modes 1: procedure GETMODEOUTPUTTASK(outputFilePath,numberTrials,epsilon) 2: while loop(1) not eof : 3: ReadLine 4: gazebomodelpath path 5: if line = start_trial then 6: current_trials 0 7: while loop(2) not end_trial : 8: current_trials current_trials + 1 9: values line 10: ReadLine 11: goto loop(2) 12: while loop(3) current_trials < numbertrials: 13: values 0 14: current_trials current_trials : goto loop(3) 16: if values%numbertrials! = 0 then 17: Print(Notmultiple) Neural Network Code for our Neural Network was created using the Matlab Neural Network Toolbox. The step by step on screen instructions,such that our design could be implemented, were followed, outputting the function code for each predictor at the end of training.

46 Chapter 6 Methodology and Tools This chapter focuses on the resources and methodology used during the implementation of the software and during the conducted research of the project. We will also present the reasoning behind the choice of tools and the distribution of time spent across various stages of development. 6.1 Methodology While the initial plan was clearly defined in terms of time spent, the development process was much messier than anticipated. There was much overlap between the steps with no clear end points. During weekly meetings with the supervisor, the requirements and next steps changed often, requiring adaptability. The topic of this project is a current topic of research within computing. The overall approach taken in the project involved an exploratory programming approach. Therefore, a straightforward software development paradigm, such as the Waterfall,could not be used. Much of the time was spent on trying out different approaches and changing previous design, if needed Research A big part of the project was spent getting acquainted with the concepts and topics related to the upcoming work. This was done to gain an in-depth understanding of the underlying theory involved with the project. Due to the fact that this project makes use of a number of elements from related work (Abelha et al., 2015), it was necessary to examine the borrowed source code to better understand its functionality and integrate it correctly. Some lines of code had to be changed, with the consent of its author. The results of the research conducted is presented in chapter 2. Getting acquainted with the software was mostly done by a hands-on approach, just by trying different functions directly.some tutorials were used but in rare cases. While some research was done prior to the usage of these tools, most of the knowledge was gained during implementation Software development The overall software development contained elements of explanatory programming with new ideas being tested and changed throughout the process. The main focus was fulfilling the functional requirements. Access to the software used in related work was given through Dropbox. New features were initially tested with one main model before attempting any of the other models. If the initial attempt at implementing the feature functioned correctly, then additional time was spent to assure its scalability across all acquired models. Each feature was tested thoroughly and if at some point, it needed changed, it was again tested with all previously tested data. Due to this, very often bugs were not due to previous features but due to the one being currently developed.

47 6.2. DEVELOPMENT TOOLS 47 Usually, a local copy of a function was made on the system if it was modified. After the modifications were finished and tested to check for expected behaviour, the files were uploaded into the main Dropbox folder, overwriting previous versions. This was needed due to the fact that newer versions might be needed by other individuals with access to the folder. Software development and research found a high level of overlap, as new possible solutions, features or technologies had to be researched before implementation. Although not all proposed features ended up being part of the project s solution, the time spent on them resulted in exposure to new technologies and concepts. Debugging was easy in Matlab, which was one reason for its selection. In Matlab, functions can be tested without recompilation. Other algorithms can be called during the pause of another one, offering flexibility in bug-fixing and testing. Sections of data can also easily be pasted in the command line with immediate effect and output Backups Keeping backups of the software and current report was high priority. The 3rd party cloud storage provider Dropbox was chosen to provide data redundancy. 1 Local archived copies were also stored frequently on both used machines to provide recovery in case of a disastrous data loss. A Git repository such as Github 2 offering branching and merging could have been more appropriate considering that the data needed accessed by multiple people. However, Dropbox was chosen due to familiarity of everyone needing the data and overall ease of use. 6.2 Development tools During implementation, a number of resources were used to develop the solution. We will be looking both at hardware and software tools, giving arguments for their usage over other alternatives Hardware The initial pre-existing pointclouds of tools were scanned using the Kinect 2 and Artec Eva 3D scanner, with both giving good results albeit Artec functioning significantly better and without needing to put paint on the objects, as in the case with Kinect 2(Abelha et al., 2015). There were two main PCs used throughout the development. The first was the one on which the main project code editing was done. The specs for the machine was: Windows 10, 64 bit version with 8GB Ram, Inter i GHz. The second machine, known as Bombadil, was used primarily for running the segmentation and Gazebo mass simulation of tools, both of which run better on the Linux operating system. It was also where most of the tool generation took place as it was more convenient to have the modified tools created directly in the folder where simulation took place. First machine was also not significantly faster during generation in comparison to the eldest. The decision to use this second machine was due mostly to it containing the CPC segmentation software and due to its operating system. Dual booting with Ubuntu the first machine and installing the needed software would have taken an unnecessary amount of time and dual booting could have created some compatibility issues

48 6.2. DEVELOPMENT TOOLS 48 The possibility of using University of Aberdeen s Maxwell cluster 3 was considered but the processing power needed for this project did not warrant usage of such a powerful machine. The other two machines were sufficient, with one at times being used for data manipulation and the other used for editing of software code Software MATLAB, version R2015b (8.6), developed by MathWorks 4,acted as the main programming language and programming environment. Matlab was chosen for its ease of use of machine learning techniques and ease of use of mathematical concepts. Another main reason was its collection of toolboxes containing easy to use functions. Matlab is also multiplatform, which was useful for the project, it being used on both the Windows 10 and Ubuntu machine. In general it proved to be an intuitive and easy to use language with features such as dynamic typing making it really easy to start coding. A major downside to Matlab involved the fact that it is proprietary software and required a paid license. While the student license was enough for some the toolboxes, ones such as the Robotics Toolbox needed a separate license. As such, trial periods had to be registered, causing some confusion with the license activation on different machines and limiting parts of development. At times, Matlab also used large amount of memory but this was not a major issue to either machines. It offered many advantages compared to the alternatives and made integration of borrowed software easier as it was also written in Matlab. The toolboxes most used are: Computer Vision System Toolbox Image Processing Toolbox. Neural Network Toolbox Parallel Computing Toolbox Robotics System Toolbox SketchUp, version 16.0, was used for converting models downloaded from 3D Warehouse and for cropping parts of models in cases where models were part of a scene. Results from the past work used for evaluation comparison were stored in.xlsx format. To view them, Microsoft Excel was used. Segmentation used the CPC algorithm code, found in the PCL library (Rusu and Cousins, 2011). Gazebo, a robotics simulation software, was used for our task simulations. As the original code of the simulation world, written by Krasimir Georgiev, was in C++, I gained some exposure to this language as well. Linux Ubuntu operating system and some bash scripting was needed for the project. As such, some experience using these technologies was gained during implementation. Meshlab version was used for the visualisation of the models, conversion of file types and filters. The screenshots of tools within this project were made in Meshlab uk.mathworks.com

Chapter 7 Evaluation and discussion In this chapter we will evaluate the results of our learning functions as well as the capabilities of the software.

49 Chapter 7 Evaluation and discussion In this chapter we will evaluate the results of our learning functions as well as the capabilities of the software. This will be done through a comparison of simulation results, related work results, human ratings and direct observation results Description of procedure Due to the time frame of the project, we focussed and evaluated the hammering task exclusively. A hammer was chosen out of the previously segmented hammers as the ideal hammer or standard looking hammer. Although it was not as well segmented as others, as it had 7 segments instead of an ideal 3 or 2, this gave us an opportunity to test our system, making sure it managed well with more segments. The model chosen was hammer_5, seen below in Figure 7.1. Figure 7.1: The chosen hammer model shown segmented. The tool generation was conducted by iterating a small amount for each model through each parameter. The modification amount was small enough to go through a wide spectrum of possible shapes. We simulated each model 10 times before taking the mode of these 10 trials as the final result for each. Mode epsilon, for converging trials was taken as 1 millimetre. Results were then used for the neural network results data, with the input data composed of the model vectors of tools resulting from the generation. The model vectors of other tools used in the comparison are extracted using the same function used during generation (ExtractModel).

50 7. EVALUATION AND DISCUSSION 50 The X, Y, Z and center_distance values presented in the results are represented in meters. Shape 1/2 take values between 0 and 2 while Tapering 1/2 take values between -1 and 1. Anything outside these ranges is approximated to the nearest boundary. Angle z and angle_centers are represented in radians. After training the neural network, we check how well the prediction function performs. The outputs of the predictor functions, represented in their individual columns, are shown in centimetres. They represent the predicted distance the nail would go into wood if hit by the object. Three predictor functions were evaluated. The first, Hammer only abbreviated to HO was trained using just hammer generation data, with a total of 362 models. The second function, HP+I used hammer as well as pencil model generation data,seen in Figure 7.2, and inertia data of each model. This was done to check if using training data from more than one type of tool will make a significant difference, especially for predictions of the two types of tools used. We chose a very good and very bad tool to get a wide range of parameters. Our third predictor function, HP-I uses the hammer and pencil data but excludes inertia data from the neural network inputs. This is done to see if inertia makes a significant difference in our predictions. HP+I and HP-I were trained on 716 models, 362 of which are the same from HO. Figure 7.2: The chosen pencil model shown segmented. We check how each predictor managed for models within the same and a mixed category of tools. Within the mixed category comparison, we use a number of different types of data to evaluate our predictors from different perspectives. Where needed, we abstract the resulting distance into the categories used in the compared data, for better visualisation. The categories of each are shown in the columns on the immediate right of the results. The accuracy of each tool is the absolute difference between the result and the prediction of that category. We show the overall accuracy of each predictor by summing the accuracy of each tool and dividing it by the number of items in the comparison. The closer the accuracy value is to 0, the better the score for each tool. Based on this selection of functions, we present the following hypotheses.hp+i will make more accurate predictions for the tools it was trained on, in comparison with function HO which will have more unrealistic predictions for pencils and similar tools which were not part of the training data. This should not affect the predictions of the inner category hammer tools, as both

51 7.1. MIXED CATEGORY PREDICTIONS 51 should be able to predict hammers correctly. Our next hypothesis states that inertia will make a significant difference, making function HP+I a better predictor than function HP-I across all categories. This will be especially obvious with tools of same categories made out of different materials. 7.1 Mixed category predictions In this section, we see how well each function manages with a wide variety of tools, most of which it has not been trained on. Out of the tools used, twenty five were the same models used in the related work (Abelha et al., 2015). These models were acquired by scanning real objects. Intuitive grasping of the tools was used for predictions and simulations. The list of tools used within this category and their parameters can be seen in Figure 7.3. An explanation of these parameters can be found in Section 4.1. Figure 7.3: List of tools and their parameters used within Mixed Category predictions Simulator Results Comparison In our first comparison, the tools were simulated using the same simulation world and set-up used in the generation. Since the simulator was the source of our data, this comparison is seen as the main accuracy comparison of the project. The testing and data collection, seen in Sim Results column, Figure 7.4, was conducted by Krasimir Georgiev as part of his honours project. We categorize the results for better visualisation, using the same criteria as in the Direct Observation Comparison: >=10 mm - very good; >5 - good with effort; >1 slightly effective, <=1 no good. Accuracy columns check how close the results for each tool are to the Sim Results column. As we observe from the table, seen in Figure 7.4, function HO is by far the worst. While it gives good results for hammers and frying pans, tools with parameters it has seen before during training, the rest of the predictions are inadequate with few exceptions. The bottle opener and scissors might be seen as less usable due to their small size and angle between segments. It is curious as to why the rest of the tools, similar in size and angles with scissors are seen as being good at hammering when the scissors got such a low prediction. HP+I is by far the best, as seen in Figure 7.5. It gives good results, especially for the pencil,

7.1. MIXED CATEGORY PREDICTIONS 52 Figure 7.4: Results in comparison with simulator results. which it has been trained on, and tools that are similar, such as the chopstick.

52 7.1. MIXED CATEGORY PREDICTIONS 52 Figure 7.4: Results in comparison with simulator results. which it has been trained on, and tools that are similar, such as the chopstick. The spatula and teapot are still predicted wrongly. For the teapot, this might be due to the fact that its actions part has a weird angling with the handle, which the network has not observed before. The same can be said for the mug. Selection of the grasp point centre is likely very important for items with a circular grasp, like these ones. While HP-I function did well, its overall accuracy was worse than the second function. We thus see that the inertia plays a big part in the usability of a tool. We can observe significant differences between HP+I and HP-I in Figure 7.5 with tools such as the wooden spatula or teaspoon. The wooden spatula is seen as being inadequate by HP+I, while it is seen as being very good by HP-I. HP+I, taking into account its material and as such, inertia, is more accurate in its predictions. There is a clear difference in the predictions once more than one tool is used for training. The pencil is a very different tool to the hammer and as such, the training was able to cover a wider range of tools. Thus, more accurate predictions are made for the pencil, key and chopstick. Adding inertia made the predictions more sensitive to the possible material of the tool. Thus why item like the rollingpin are indeed predicted correctly to perform worse in the simulation Related Work Results Comparison In this section, we compare our predictions with the predictions given to the 20 tools with the system created by Abelha et al. (2015) published in the The International Conference on Robotics and Automation (ICRA). The study used hard coded functions for each task, stating how important each parameter was for the overall prediction. It used a good deal of human judgement. We want to compare and see if our machine learning approach performed similarly. The paper used a usability rating system from 1-10, with the following categories: does not work; could work; works best. For a better visualisation, we abstract our previously predictions distances such that everything above 10mm is considered a 10, everything below 1mm is a 1 and every 1mm in-between representing a different step. Here, accuracy represents how close our predictions are to the ICRA predictions. The results from the related work gave good predictions in comparison with human ratings

7.1. MIXED CATEGORY PREDICTIONS 53 Figure 7.5: Accuracy comparisons of each function for all tools. Smaller bars represent better accuracy. Figure 7.6: Results in comparison with related work results presented at ICRA.

53 7.1. MIXED CATEGORY PREDICTIONS 53 Figure 7.5: Accuracy comparisons of each function for all tools. Smaller bars represent better accuracy. Figure 7.6: Results in comparison with related work results presented at ICRA. of the tools. Some of the items it did not do too well on were the spoons, scissors and spatula. Our system s HP+I arguably gives a better prediction related to spoons but this could be simply due to the impact a low inertia has on the function. The pencil data training likely had a big impact on the way inertia is interpreted for smaller objects considering many of the pencil models scored very low. The first function is again significantly worse than the other two,as seen in Figure 7.7, further consolidating our hypothesis. Surprisingly, HP-I function predictions were similar to HP+I. This could be explained to the fact that the hardcoded functions from the ICRA paper did not take into consideration mass and inertia, thus why our function not using inertia was closer. Overall, the ICRA predictions are closer to reality than our system, with the knife, rollingpin, scraper and teapot having significantly better predictions by the ICRA paper. In defence on our system, these items are very different to the two tools we used for the training of the 2nd and 3rd predictor. The difference between wooden items were not as noticeable as the ICRA paper did not use inertia,

7.1. MIXED CATEGORY PREDICTIONS 54 Figure 7.7: Accuracy chart for related work comparison. thus making both predictors similar. 7.1.3 Human Rating Comparison In this section, we check our predictions to how humans have rated the usability of the tools, based on 2D rendered pointcloud images.

54 7.1. MIXED CATEGORY PREDICTIONS 54 Figure 7.7: Accuracy chart for related work comparison. thus making both predictors similar Human Rating Comparison In this section, we check our predictions to how humans have rated the usability of the tools, based on 2D rendered pointcloud images. This is done to compare our system s predictions with human judgement. The ratings were taken from the previous related work Abelha et al. (2015) and use the same criteria for categories. We used the same category criteria as the previous comparison. Figure 7.8: Human Rating comparison results The significant differences for the HP+I function can be seen with the serving and wooden spoon. Humans have a more optimistic view on the usage of small objects. Our 2nd function, having been trained on a small object, is more pessimistic on small, wooden objects. Surprisingly, again HP-I has overall closer predictions. It is possible that adding inertia to training might not make an overall huge improvement except in regards to models where the material is important. This could also be due to the fact that most of the tools tested are made of stainless steel. If the tools had more diversity in their material, the difference in adding inertia data would have been more significant in all our mixed comparisons. It is also hard for humans to

7.1. MIXED CATEGORY PREDICTIONS 55 Figure 7.9: Human Rating comparison results correctly decide on usability based on images without actually feeling the weight and inertia of an object.

55 7.1. MIXED CATEGORY PREDICTIONS 55 Figure 7.9: Human Rating comparison results correctly decide on usability based on images without actually feeling the weight and inertia of an object. Thus, the predictor without inertia is not significantly worse Direct Observation Comparison The real items, sources of the models used previously, were tested in a real life hammering task. We call the resulting data the ground truth. As the simulator was setup to imitate soft wood, the ground truth check used soft wood as well. We abstract our predictions into the values seen in the legend of Figure Figure 7.10: Ground Truth comparison results The overall tendencies are the same as in previous subsections. This makes sense as the ICRA and human ratings are similar to overall tool behaviour. The key is not seen as being an inadequate tool. Although it is a small item, the material is probably what saved it from being labelled no good. One surprise is the knife being labelled good with effort. Both the 2nd and 3rd function saw it as being inadequate. One possible reason is the fact that the knife is really thin. While it is good for hammering due to it being heavy enough and having a wide side, the neural network

7.2. INTER CATEGORY PREDICTIONS 56 has not seen thin objects performing well. All the hammers in the training having one small shape parameter(under one centimetre), performed poorly.

11. Figure 7.11: Ground Truth comparison chart. 7.2 Inter category predictions We check to see how our system predicts the usability of the other hammer models we acquired previously.

56 7.2. INTER CATEGORY PREDICTIONS 56 has not seen thin objects performing well. All the hammers in the training having one small shape parameter(under one centimetre), performed poorly. Also considering the pencil being thin, the penalty for having any small shape parameter might be very high. Same can be said for the ladle and scraper. This can be seen in Figure Figure 7.11: Ground Truth comparison chart. 7.2 Inter category predictions We check to see how our system predicts the usability of the other hammer models we acquired previously. To do this to check the system s behaviour with a wide range of one type of tools. The hammer models used within this comparison and their parameters are the seen in Figure Figure 7.12: Inter category tool models The most unexpected aspect is how badly HP+I function behaved in this section. While the function trained exclusively on hammers gave good predictions for all the models, the second heavily punished almost all the other hammers. Since function 3 gives good predictions, we can conclude that the punishment done in the 2nd function comes from the inertia. Hammer 7 is given a huge score in the 2nd function. This might be due to a larger segment being selected as the action part, resulting in better inertia, as the hammer was segmented weirdly, making wrong selection likely. The situation is more surprising, considering that the hammer in the mixed category evaluation was predicted correctly. This leads us to believe that there might be an error in the inertia

57 7.3. DISCUSSION 57 calculations, resulting in different values for the same type of models. Another, more likely, possibility is that the neural network training weights for the inertias were trained to be unrealistic. This might have been due to the relatively small training data. More than half of the training results in a zero, thus making it likely to overly punish some parameters. Figure 7.13: Inter category Hammer results. 7.3 Discussion As we have concluded before, there was a definite need for data generation. We now have to ask if our choice of procedure for data generation was the right one. The predictors, while HP+I performed best, were not adequate in our comparisons. The most likely explanation is the lack of training data covering all possible types of tools. Using just one type of tool for our training, as seen in function HO, leads to very poor results with the exception of tools similar to the one being the basis of training. Training with different types of tools result in better predictions across the spectrum, seen in function HP+I. Regardless, some types of tools were still never part of the training, resulting in bad predictions. A good example in our comparisons are tools with one small shape parameters, such as the knife and spatula. Our data generation does not cover a wide enough range and it is possible that approaches like the Gaussian Mixture Model, Section 2.6, might provide sampling of parameters, despite our initial dismissal of its usage. The first hypothesis is proved as the first prediction performed abysmal across all categories in comparison with the second and third. Our next hypothesis regarding inertia addition having a significant advantage is not proved as in many cases HP-I function gave similar, or even better overall predictions. The differences could have been more significant with a wider range of tool materials. The biggest surprise is how well the HP-I function performed compared to the one encompassing inertia in inner category comparison. This could be due either an undiscovered error in our inertia calculation leading to a big penalty in our predictors for some inertia values or due to inertia parameters being harshly punished due to the draining data. Overall, we can conclude that HP+I is the best predictor in a varied range of tools due to the similarity with simulator results and ground truth.

58 7.4. OVERALL PERFORMANCE Overall Performance The solution functions correctly across all types of tools. We can take any model, segment it, fit superquadric shapes onto it resulting in parameters and then either generate altered models from it, creating a neural network prediction function, or test the parameters on an existing function. The parameters returned are representative for all the tools used in our comparison. The shapes of different tools were manipulated well throughout the parts of our solution(segmentation, rotation etc.), regardless of the number of segments. We can thus conclude that our software is scalable through multiple types of tools and can theoretically be used for any type of point cloud.

59 Chapter 8 Conclusion and future work 8.1 Conclusion Our approach attempted to learn affordance of tools through synthesized models. We attempted three different approaches for our functions. While we can see that the data suggests that training based on a large variety of tools with inertia information tends to give better predictions, we have to re-evaluate our approach for acquiring training data. Although we initially decided against a Gaussian Mixture Model, our approach of modifying each parameter individually did not provide enough representative data. We attempted to acquire a wide spectrum of tools using few initial models but did this did not prepare the neural network for vastly different models. The biggest surprise was to find that our attempts at including the inertia of a model resulted in vastly worse results in our inter category evaluation. This is most likely, again, due to our data not covering the wide spectrum of possible tools. More than half of our data instances result in a zero, which might have caused the weights of the inertia parameters to be unrealistically punishing. On the other hand, a good amount of work was put into developing the process of preparing and altering models. We can now use any model to segment, fit superquadric shapes, alter segments of it and prepare it for Gazebo simulation. This approach is still useful in future work where we will be using superquadric shapes for pointcloud manipulation as the parametrisation of this approach is simple and representative. We have managed, within the time frame provided, to implement all required preliminaries needed for our approach and also get initial results, giving us a good idea for the direction of our future work. The project provided good experience and insight into an active domain of research. Overall, most of the primary and secondary goals have been achieved to a good degree. 8.2 Future work There are two approaches we can take for future work. We could firstly continue attempting to improve our predictor functions using the same method of data accumulation. This can be done by using more models as basis for generation, resulting in a wider range of tools. We could attempt a different machine learning technique instead of a Neural Network. Considering that the hand coded functions of Abelha et al. (2015) were linear and quadratic, we need another technique able to learn quadratic functions. The other approach involves changing the way we acquired our data. While online models could still be used as a basis, we could implement a method of sampling which is able to cover a wider range of parameters, such as a Gaussian Mixture Model.

60 8.2. FUTURE WORK 60 Further improvements can be done on automating our functions. We could improve software to decide if model is graspable. While a human can grasp nearly anything, some verification could be done on, for example, the size of the handle. We also could possibly find which segment is the grasp segment instead of using human input. The issue is that this remains an area of search within computer vision. As such, we might not want to concentrate our time on this feature. Segmentation was a process of trial and error involving the parameters of the software being changed until we had a clearly defined action and grasping part. This could be changed in the future to be done automatically as this part required human judgement. This could possibly be done by changing the parameters iteratively until a small enough number of segments are present(three or two). If more than one models results, we can simulate each possibility, taking the model which returned the biggest score. Future work might involve making generation and simulation one element, implementing generation cut-offs related to usability decreasing, as initially discussed within Design Section refinitial. If we acquire different simulator world for different tasks, we would need to implement more tool generation functions which would use different parameters depending on the task (for example gap size for Retrieving Object). Different scaling should be used for different tools. This is an issue as models online can be found in many different ranges of sizes. We need a generic scaling function but this is done mostly by human knowledge of the size of tools based on their category. We should not have to categorize tools using human knowledge as we want our system to decide affordance of generic objects.

61 Appendix A User Manual This chapter will explain how to run the software developed for this project. This software is intended for research usage and not for general user usage. As such, the instructions are abstracted to that level of knowledge of software use. Basic running information is given for 3rd party software such as Meshlab and the CPC segmentation but more in depth knowledge can be found in their individual manuals, online. A.1 Usage Instruction We will look over how each major part of the project is ran properly. A.1.1 Matlab Step 1: Open file with Matlab by pressing open on the home tab and browsing for the file that we need. Step 2: Make sure the folder path is set to the submission folder. If not, change it by pressing on the Browse for folder on the left of the current path display. Select the submission folder. This will make it possible to run functions by writing their name in the command window. Step 3: Run the desired function by writing its name and parameters in the command window. For example: [P_orig,model_orig,SQs_orig]=PcdToSim D:\Submission\Models\, triangle_hammer_5_out.pcd, ham ],[ ],[1;0;0],0.078,0.325,0.05, stainless_steel ] A.1.2 Tool Generation This file encompasses a good part of all of functions developed during the project. It makes use of close to all parts of the design during its run. To run tool generation: Step 1: Some variables need to be changed for the current system it is run on. fid=open() needs to be changed to a text file where the vectors will be outputted to. Variable gazebomodelpath in file CreateGazeboModelFolderStructure needs to be changed to the desired Gazebo world folder. Step 2: Change the variables to the type of generation that you desire. modification is the initial change to a model. The loop on line 7 represents how many altered models we need and can be changed. modification change on line 12 is changed to what alteration we want for each iteration. Step 3: Run the function PcdToSim with the needed parameters. The example seen above can be used. Step 4:When prompted click on where the tool should be grasped and where it will make contact with the nail. Seen in Figure A.2

A.1. USAGE INSTRUCTION 62 Figure A.1: Steps to running Matlab. Step 5:This will result in folder structure as presented in the design. Each folder with its required files and altered model. Figure A.2: User Interface asking for the user to click on a grasp and action point.

62 A.1. USAGE INSTRUCTION 62 Figure A.1: Steps to running Matlab. Step 5:This will result in folder structure as presented in the design. Each folder with its required files and altered model. Figure A.2: User Interface asking for the user to click on a grasp and action point. Example runs: Some examples of tool generation runs. Example 1: Change new_model_vector to new_model_vector(1) to change first parameter. Change the modification on line13 to modifications to set a half centimetre to each iteration. Line 8 the loop can be changed to i=1:10 such that 10 models are created. This results in 10 models altered on the X parameter. Example 2: Change new_model_vector(8) to change the anglez. Modification is changed to modification + pi/20 for a 9 degree change for each iteration. i=1:30 for 30 models. This results in 30 models, each having a difference of 9 degrees compared to the previous model. Each individual parts of the solution can be run independently in the same manner. A break point can also be set by pressing on the desired line of the code. At that time, Matlab has access to the data of that session within its workspace. This data can then be used to run other functions by writing the desired function into the command window as you would any function.

A.1. USAGE INSTRUCTION 63 Figure A.3: Range of modified hammer models. To run mass Gazebo simulation: Step 1: Copy Gazebo.sh file into the Gazebo model folder. Step 2: Run function with command:.

63 A.1. USAGE INSTRUCTION 63 Figure A.3: Range of modified hammer models. To run mass Gazebo simulation: Step 1: Copy Gazebo.sh file into the Gazebo model folder. Step 2: Run function with command:./gazebo.sh on unix terminal To run mode calculation: Step 1: Change output.txt path to where the results of the simulation are located. Step 2: Run function with command: GetModeOutputTask in the command window. Step 3: Get modes by writing modes. This outputs the data structure containing the modes. To run Neural Network Predictors: The neural network predictor scripts are Hammerin- Task, HammerintTask2 and HammeringTask2NoInertia. Each can be run by writing the function name with the model vector as parameters. Example: HammeringTask2NoInertia( ) HammeringTask and Hammerint- Task2NoInertia need 10 parameters. HammeringTask2 needs 13. A.1.3 Meshlab To apply meshlab filter for pointcloud Step 1: We open the model file with the use of File/Import mesh. Step 2: We run the filter we need by pressing on Filters/Remeshing, Simplification and Reconstruction. And then the filter we want. Step 3: We export the new model with File/Export mesh. Figure A.4: Meshlab File and Filter tabs A.1.4 CPC Segmentor We change directory to the build folder. The model to be segmented should be in the same folder. We call the terminal command with the needed parameters. Example of functions can be seen in the last column of the table in Figure A.4.

64 A.1. USAGE INSTRUCTION 64 Figure A.5: The segmentation commands used for the initial tools.

L1 - Introduction. Contents. Introduction of CAD/CAM system Components of CAD/CAM systems Basic concepts of graphics programming

L1 - Introduction. Contents. Introduction of CAD/CAM system Components of CAD/CAM systems Basic concepts of graphics programming L1 - Introduction Contents Introduction of CAD/CAM system Components of CAD/CAM systems Basic concepts of graphics programming 1 Definitions Computer-Aided Design (CAD) The technology concerned with the