AUTOMATIC CHARACTERISATION OF BUILDING ALIGNMENTS BY MEANS OF EXPERT KNOWLEDGE

Size: px

Start display at page:

Download "AUTOMATIC CHARACTERISATION OF BUILDING ALIGNMENTS BY MEANS OF EXPERT KNOWLEDGE"

Alexina Wilkerson
6 years ago
Views:

1 AUTOMATIC CHARACTERISATION OF BUILDING ALIGNMENTS BY MEANS OF EXPERT KNOWLEDGE Ruas, A. and Holzapfel, F. Laboratoire COGIT, IGN. 2 Avenue Pasteur, Saint Mandé Cedex, France. Fax: anne.ruas@ign.fr ABSTRACT The automation of the generalisation process requires to set up a system that is able to automatically find how to generalise which objects. Previous research has shown that the identification of groups of objects helps the system to generalise together objects that are sharing properties or/and that have collectively a geographical meaning. Groups of objects such as towns and urban blocks have already been detected and successfully introduced in the AGENT generalisation prototype. This paper focuses on the specific case of building alignments. A previous paper already proposed a method to detect such structures. This paper presents the next step: the qualitative characterisation of the alignments to automatically retain the alignments that are significant and should be maintained during the process. The characterisation is also used to make a relevant typification process of these retained alignments. A previous paper [CHR 02] already proposed a method to detect such structures. This paper presents the next step: the qualitative characterisation of the alignments to automatically retain the alignments that are significant and should be maintained during the process. The characterisation is also used to make a relevant typification process of these retained alignments. For the generalisation process, an alignment is important if it has regular properties of alignment, size, shape, orientation and inter-distances. In other words, if buildings 'look the same' and if their linear arrangement is regular then an alignment is important. The computation of a measurement for each criterion is not a difficulty as many measures exist in the literature. In the same way the standard deviation is already known as being a good way to compute regularity. The first difficulty is to find thresholds to distinguish regularity from irregularity for each criterion. The second difficulty is to find an appropriate aggregation function that combines these measurements to distinguish regular from irregular alignments. After a presentation of our objectives, the paper begins with the method used for each criterion to go from a quantitative value to a qualitative evaluation. Experts were asked to evaluate a set of alignments with marks (from 1 -excellent- to 5 -very bad- ) and we sought for functions between these marks and the ones computed by our measures. This technique allowed us to complete and to improve our measures, and to find each evaluation function. The second step focuses on the conception of a good aggregation function. The difficulty was to know if some criteria are perceptually more important than other ones. In the same way, experts evaluated alignments globally and we used a cost function to compare expert evaluations with different combinations of criteria. At the end we obtained a sub set of acceptable aggregation functions that were tried on new alignments. The paper presents the results on concrete cases as this work is fully implemented on the GIS LAMPS2 and has been tested on real geographical data from the IGN 1m resolution vector DB. Besides the fact that we found functions to automatically retain important alignments, this research gave us methods to introduce expert knowledge within generalisation sub-processes. 1. CONTEXT AND OBJECTIVES The generalisation of buildings at medium scale (smaller than 1: ) in urban area requires an enlargement of nearly all buildings in order to respect the readability constraints. These emphasisings provoke proximity and density conflicts that can only be solved by buildings removal. Although objects removal is necessary, the preservation of geographical meaning is also essential: the removal should not 'dramatically' change the intrinsic information holds in the initial data. Consequently the 'object removal' process is viewed as a contextual process. The generalisation of a set of buildings is often named 'typification' to distinguish it from a mere object removal process. It encompasses removal, displacement, emphasising and eventually line simplification. Typification could be defined as the contextual generalisation of a set of close buildings that conserves the main spatial properties of the initial set of objects. We could also say that typification is the process that produces a new representation (of the same reality) that preserves as well as possible the main properties of the reality while respecting the new readability constraints. Proceedings of the 21 st International Cartographic Conference (ICC) Durban, South Africa, August 2003 Cartographic Renaissance Hosted by The International Cartographic Association (ICA) ISBN: Produced by: Document Transformation Technologies

2 Let us name readability the test that checks if graphic constraints are respected (minimum size, granularity and distance constraints). Readability depends on the final scale. Given: E ini the initial space composed of a set of non generalised buiding objects {Oi}, readability{oi} = false, E gen the space composed of a new set of building object {O'j} obtained by typification {O'j} = Typification {Oi} then: A good typification could be defined such as: Card(E ini ) > Card (E gen ); readability{o'j} = true, main_properties (E ini ) = main_properties (E gen ) some objects have been removed the new set respects the graphic constraints the main properties are maintained Of course, the complexity is in the definition of these main_properties and in our capacity of developing measures to compute them. As a consequence, we want to add to a generalisation system - whatever it is - the relevant information by means of objects and attributes - to better control the process of generalisation, following the principle:! If the system is not aware of the existence of something important (such as a pattern)! Then it can not preserve it OR it preserves it by chance. Figure 1 shows a generalisation of buildings extract from IGN-France BDPays to produce a map at 1: The generalisation is performed with the AGENT prototype [1] at a time where it did not contain the concept of alignment: As the system does not know an alignment exist, it can neither try to preserve it nor control the fact that it has been destroyed. The system is 'happy' with the solution because it is not aware of the degradation of this pattern. Figure 1. Example of automated generalisation at 1: without considering the concept of alignment. To detect the main properties that should be preserved during the process of generalisation, our approach is to go towards an explicit representation of the information used for decision making during the generalisation process. Following this idea, instead of having an implicit representation of knowledge inside one generalisation algorithm, we prefer to create objects -and attributes- that represent the concepts used inside this algorithm. This explicit representation of the knowledge aims at proposing new algorithms that should obtain better generalisation results. Moreover, the explicit representation of geographical concepts allows a better control of the results either during the process (as it is done in Agent [1]) or after generalisation [2]). The aim of this paper is to present a method to characterise building alignment for cartographic generalisation purpose. To be more precise, we begin the process with a set of building alignments detected by the method proposed in [3] and we want:! to give a mark to each alignment. This mark is used to find the 'best alignments' that should be maintained during the generalisation process. The mark should reflect the regularity of each alignment,! to use the characterisation of each alignment to generalise it as well as possible. The aim is to reduce the number of buildings within the alignment while preserving the main characteristics of the alignment. Section 2 presents the conceptual data modelling used for this research. Finding the best conceptual modelling is a necessary step for the automation, even if the implementation is rarely a faithful image of this modelling. An appropriate modelling simplifies rules, algorithms and controls. Section 3 presents the principles and the previous work. Section 4 presents the procedure we follow to acquire expert knowledge, and section 5 presents the results obtained in summer 2002.

3 2. THE CONCEPTUAL DATA MODEL At the beginning of the process we start with a geographic data base (Geo_DB1) and we want to create a cartographic data base (Car_DB2). A cartographic data base is composed of objects that have constraints related to a map scale (as DKM in Germany). For example the cartographic objects are big enough and far enough to be readable. A cartographic data base is more or less ready to be printed, but it is still a data base and not an image. As a contrary, a geographic data base is composed of geographic objects which have a position and size which are only constrained by the data base accuracy. In this section we want to describe an appropriate conceptual data model to go from Geo_DB1 to Car_DB2 and we just concentrate on buildings. To avoid ambiguity we should remind that geographic objects and cartographic objects are two different representations of the reality: none of them are more real or better than the other one, both representations follow different rules. Let us say we have buildings in one class named Building G1, in the initial data base Geo_DB1 and we want to create new buildings in the new class Building C2 of the data base Car_DB2. Because of generalisation, the relationships between the objects of Building_G1 and the objects of Building_C2 are not trivial: Figure 2 1 shows that there is no bijection between objects. Car_DB1 Building G1? Car_DB2 Building C2 Figure 2. Representation of buildings within an urban block for two scales. To perform this generalisation, the concept of Urban_block has been used. A urban block is a meso object defined by a cycle of streets that contains buildings. The introduction of the concept of urban block is done by the addition of a class (named Urban_block) and the creation of objects. We say that one urban block object is composed of buildings, as we would say that a tree is composed of leafs. Urban blocks are objects created for generalisation purpose. Possibly, they can be maintained at the end of the process if it is decided to update the cartographic data base Car_DB2 from the geographic data Geo_DB1. If we go back to Figure 2, the relationship between the buildings of G1 and the buildings of C2 is that 'they belong to the same urban block'. In the other way round, the same urban block can be represented by its component buildings of G1 OR by its component buildings of C2. The relationship between buildings is described by the relationships of composition. Urban block Class (and objects) for the process Class from Geo_DB1 Building G1 Building C2 Class from Car_DB2 Figure 3. Adding urban blocks objects. In the same way, if the concept of alignment has been used to produce buildings of Car_DB2 from buildings of Geo_DB1 then the alignment is an appropriate object to describe the relationships between the buildings of G1 and the buildings of C2. Consequently the concept of alignment is introduced in the model by means of a new class and objects: we add a new class named Alignment which is used to create a new set of buildings of the class Building_C2 from some aligned buildings of the class Building_G1 (Figure 4). An alignment is composed of a set of buildings of Geo_DB1 and another of buildings of Car_DB2. We could also say that the same alignment is represented by a set of building objects in Geo_DB1 and another set building objects in Car_DB2. 1 In Figure 2 we name Car_DB1 a cartographic data base that would well correspond to Geo_DB1

4 Alignment Geo_DB1 Car_DB2 Building G1 Building C2 Figure 4. Adding the class (and objects) Alignment to produce C2 buildings from G1 buildings. As for urban block objects, the objects alignments could be maintained at the end of the process if it is decided to update the cartographic data base Car_DB2 from the geographic data Geo_DB1. To sum up, in order to generalise buildings at medium scale in urban area, we need to introduce some meso objects. These new meso objects hold the building contextual operations such as building_removal, building_displacement and typification. A building hold independent algorithms such as Emphasizing, Simplification and Squaring. The contextual generalisation of a building could be defined as follows: IF a building belongs to an alignment THEN it is generalised as a part of this alignment by means of typification algorithm ELSE it is generalised as a part of an urban block by means of building_removal and building_displacement algorithm. Figure 5 shows a conceptual data model used for the medium scale generalisation of buildings. Alignment Characteristics Typification Building C2 Urban block Characteristics Alignment-detection Building-Displacement Building-removal Characteristics Emphasising Simplification Squaring Figure 5. Data modelling for urban generalisation including building, urban block, alignment and their algorithms. For practical reason at the beginning of the process, buildings objects from Building_G1 class are copied into the new class Building_C2. When they arrive into the Building_C2 class, none of them respect the generalisation constrains related to Car_DB2 ( Oj Building_C2, readability (Oj) = False). At the beginning of the process, the classes Alignment and Urban_block are empty. Methods are used to create the urban block objects. Each time an urban block is created, its relation of composition are instanciated (the relationship between one urban block and its buildings). Then each urban block object seeks for alignments by means of an analysis of its own buildings (method Alignment_detection in Figure 5). If one (or more) alignment is found an object Alignment is created and linked to its component buildings. When this phase of data base enrichment is finished the process of generalisation may start (one can also create alignments during the process if preferred). At the end of generalisation, the buildings of Building_C2 respect the constraints of Car_DB2. ( Oj Building_C2, readability (Oj) = True). They are very different from their reference objects of Building_G1 from Geo_DB1. The meso objects used for the process (here the alignment and the urban block objects) can either be removed or maintained in the case of multiple representation purposes. If they are removed, they are ephemeral (or process) objects. Figure 6 sum up the four classes used for building generalisation.

Alignment Urban block Building G1 Building C2 Figure 6. Data modelling including the reference (Building_G1) and the meso ephemeral objects.

It illustrates the data base enrichment to proceed generalisation and eventually to go towards multiple representation databases.

1 The detection of alignments In previous work [3] we managed to detect automatically alignments inside each urban block.

5 Alignment Urban block Building G1 Building C2 Figure 6. Data modelling including the reference (Building_G1) and the meso ephemeral objects. Of course, this proposed data modelling is neither unique nor complete. It illustrates the data base enrichment to proceed generalisation and eventually to go towards multiple representation databases. The data modelling presented in Figure 5 is used for the research work present hereafter. 3. PRINCIPLES TO CHARACTERISE THE ALIGNMENTS 3.1 The detection of alignments In previous work [3] we managed to detect automatically alignments inside each urban block. The principle is based on the projection of building centroïd on an 'exhaustive' serie of lines inside an urban block. Cluster of points are detected: they represent potential building alignments (see Figure 7). Figure 7. Projection of centroïds on line and Building alignment detection in [π, 2π] from [3]. Even not optimised, the method seams to be robust: from experimental studies it detects very well the alignments. But at the step, all alignments are detected whatever their perceptual quality. The next step of this work was based on a characterisation of these detected alignments to get rid of the 'non perceptual' ones, i.e. the ones that are not enough regular. To retain only the regular alignments, we proposed to look for the aligned buildings [3]:! that are distributed in a regular way (their distance is regular along this line),! and that look the same in terms of shape, size and orientation. 3.2 The quality of an alignment Given p a property computed by a measure m, an alignment composed of n buildings has n different values of p. The alignment is p-regular if the n values of p computed with m are nearly the same. r is the measure that characterises the regularity of the property p, i.e. the similarity of the values computed with m. If p is the size, an alignment composed of 5 buildings is characterised by a set of 5 different size values. The alignment is size-regular if these 5 values are nearly the same. We define the quality of an alignment as being its global regularity. The global regularity is a specific aggregation of all p-regularity. Given p i the property computed, the objective is thus to find a function f such as: Quality (alignment j ) = f ( p 1,.., p i, p n ) (1) To develop f it is first necessary:,! to identify all properties {p i },! to find a measure m i for each property p i,! to find a measure r i for the regularity of the property p i,! to develop f from the r i.

6 The first point is to identify the main relevant properties (see Figure 8). For this step, we simply watched maps. Watching different alignments helped to easily identify a first set composed of 1/ the geometry of the alignment, 2/ the regularity of distance between buildings, 3/ the regularity of the shape 4/ the regularity of the size and the 5/ the regularity of the orientation. alignment distances shapes size orientation Mean distance (stretching) QUALITY = SUM of all regularities Figure 8. What makes a pattern a 'good' one? After experiments, we add a measure of density or stretching that qualifies the fact that 1/ it is more difficult to perceive an alignment if the buildings are far away (the mean distance is high) but 2/ the distance depends also on the mean size of the buildings: the bigger the size, the higher the accepted distance. The first experimental study [3] showed that the conception of the measures m i is not very difficult, nor the regularity r i but the detection of f the aggregation - is very complex. 3.3 The regularity of the property: choices and difficulties An alignment is initially characterised by 1/ a number n of buildings and 2/ a geometry which is the segment computed by the regression of the centroïd of the buildings (see Figure 7, right). This segment is named the centroïds_segment. The regularity r i is generally based on by the standard deviation σ related to the median value (not the average because there is too few objects): σ i = (1/n j (m i (b j ) median i )² ) 1/2 with b j a building, j [1,n] (2) Different measures m i and r i have been tested. From the first experiments [3], some small changes have been made to improve the results: we choose more robust measures of alignment and orientation. We also add the measure of stretching. Moreover the measure of elongation (shape) used previously has been abandoned. For our experiment, 6 measures of regularity have been chosen. We remind that m is the basic measure on each building of a property p, and r is the regularity often computed from m and σ equation 2). The chosen measures m and r are simple. The measure of orientation is a little bit more complex. We do not describe it hereafter but in [4].! Alignment m = distance from each centroïd of a building b j to the centroids_segment r = average(m); unit = meter! Distance m = minimum distance between two consecutive buildings b j, b j+1. r = σ computed with (n-1) values; unit = meter! Shape m = concavity = Area (b j ) / Area (convex_hull (b j )) r = σ ; unit = -! Size m = Area (b j ) ; r = σ ; unit = m²! Orientation m = wall_orientation = main wall orientation not an average [4]; r = σ ; unit = degree! Stretching r = average (distance(b j, b j+1 )) / sqrt (average (size (b j ) ) )

7 Of course as properties are different their unit and their domain of values are very different one from another. As an example, for one alignment, the building sizes vary between 100 and 1000m², the distance between buildings vary between 6 and 30 m; the orientations vary from 0 to 2 d, the concavity stays between 0.9 and 1, and so on. As a consequence the standard deviation is also very heterogeneous from one property to another. Classical normalisation of value (such as new_value = (max - value) / max ) have been tested but a mere analysis of the result shown that the qualitative meaning behind each property change sometime dramatically (Figure 8 Left), and consequently an aggregation of these values to compute a global quality would be meaningless. In such a way a classical normalisation is not adapted to what we are looking for. To be able to 'trust' a number, we need to be able to have the same meaning behind the same value, whatever the property (Figure 9 right). 1 Size 0.8 bad good 1 Inter distance 0.4 bad f? 1 Prop j 0.5 bad good good Figure 9. Towards a qualitative normalisation of the properties? 4. THE USE OF EXPERT KNOWLEDGE 4.1 Supervised learning techniques The difficulty to find a good function of aggregation is not a surprise as there is no mathematical logic that would allow to distinguish intrinsically what is regular from what is not regular. Mathematical measures are ideal 1/ to define a perfect regularity (values are the exactly same) and 2/ to order two different alignments for a same property. The limit between what is regular and what is not regular depends on perceptibility criteria that are human by definition. We have consequently chosen to use expert knowledge to distinguish good from bad alignment, to go from quantitative values to a qualitative interpretation of these values. Supervised learning techniques are ideal to solve this kind of difficulty. To sum up their principles, a population (here a set of alignments) is characterised by a set of measures (m i or r i ). An expert visualises this population and classifies each element (in our case the classification would be to give a qualitative mark such as A, B, C, D and E). A learning algorithm connects the measures of the population with the classifications given by the expert. This learned rules of classification' is then used for a new population for which no classification exists. In order to validate what has been learned, a part of the population classified by the expert is not used for the learning step but is reserved for the validation. If the system classifies this validation set as the expert, the learning is correct. Different supervised learning techniques exist. The very famous neural network can be very efficient with a large set of examples but has the great disadvantage to produce a black box : the rules of classification are not formalised in a readable way. The symbolic supervised learning is a very interesting alternative as it produces rules that are formalised in a symbolic way. Algorithms such as C4.5 [5] or ID3 are available for that. The use of learning techniques is advised for the automation of generalisation from a long time [6] but experiments shown that the results obtained are rarely as good as expected. We believe that it comes from the procedure of learning. In the next section we present the procedure of learning we have chosen. 4.2 A learning in two steps From the equation (1) we already identified the required property p i and we proposed a way to compute r i. The introduction of the expertise is done by one (or several) expert that visualises an alignment and gives it a mark. Given A = {al j } where al j is an alignment defined by 6 measures (r 1,..,r i,..r 6 ) and by a mark mark(al j ) given by the expert, we are now looking for f such as: Quality (al j ) = mark (al j ) = f ( r 1,.., r i, r 6 ) (3) Of course there is always a certain level of imprecision in such a work, so the determination of f requires an important number of marked alignments.

8 To find f, one possible process could be:! To ask experts to mark a set of alignments from A (good) to E (bad): five classes are defined! To compute the regularity r i on each of these alignments! To use symbolic supervised learning algorithms to create rules of classification and to analyse the results. Because of previous experiences, we made another choice. Different researchers of the COGIT laboratory have already used learning techniques for data matching or generalisation [7,8]. These studies prove if it was necessary that these techniques propose a good classification only if 1/ the problem is very well defined and limited and 2/ if the measures are good and the examples are numerous enough. More specifically the PhD of Sébastien Mustière [8] is very interesting for a good understanding of this type of techniques and their use in our context: To obtain a real adding value (i.e. a result which is clearly better than rules defined empirically) the process has to be decomposed as much as possible into logical and well defined steps. Each step will thus be carefully controlled and possibly revised. Here again it is not new to remind that we should divide problems into sub-problems if we want to understand and master them. As a consequence, we have chosen:! To decompose the process of learning into simple subtasks! To use - as much as possible very simple learning techniques, because mastering the learning process helps to understand the knowledge encompassed in the data. Instead of seeking directly f from r i, we first tried to normalise each ri by means of expert knowledge (step 1) and then (step 2) we use again expert knowledge to build f from the new normalised values q i. This procedure has also the advantage to obtain at the end of the process not only a global mark but also one mark per property and to have comparable marks. Eventually after a good understanding of the logic of building the property aggregation function in order to compute the global quality, another approach using 'classical' supervised learning technique could be envisaged. STEP 1: We ask an expert to mark the regularity of each property and we seek for six appropriate functions f i that give a qualitative value of a property from the quantitative value r i. f i : D i D R+ r f i (r) = q i (4) STEP 2: We ask an expert to mark the general quality of each alignment and we seek for appropriate functions f that give a qualitative value of a property from the normalised qualitative value q i. f : D x D x D x D x D x D D R+ (q 1 q 2 q 3 q 4 q 5 q 6 ) f (q 1 q 2 q 3 q 4 q 5 q 6 ) = quality (5) 5. EXPERIMENTS AND RESULTS 5.1 Step 1: Normalisation of the property For each property different experts were asked to give a mark from 1 (excellent) to 5 (very bad). The cartographers of the COGIT laboratory have been considered as the experts (Cécile Duchêne, Jenny Trévisan, Sylvain Bard, Xavier Barillot and Anne Ruas). 30 alignments have been chosen. Then we draw the relationship between the mark given by the expert q i on X-axe and a mark given by the measure r i on Y-axe (see Figure 10). Most of the time we noticed that a linear function (r i = a q i + b) gave acceptable results. The accurate analysis of these graphics allow us:! To complete the set of examples tested to be sure that the examples describe the solution space,! To detect and to analyse cases that seam to be very different from others (Figure 10, red circle)! To find mistakes in basic measure: as an example we changed the measure of orientation and alignment because the one we used before were not robust enough

Size - Expert 3 Size : Average of 3 experts Value computed R i Expert Mark Figure 10. Relations between quantitative and qualitative values: analysis of results.

We noticed that some experts gave a bad mark when there was one exception in a regularity while others were less severe. Anyway, we managed to find functions.

2,4152,841 3,132,89 2 2,211 2,004 2,169 1,4591,256 1,471,586 0,97 0 0,00 1,00 2,00 3,00 4,00 5,00 6,00 notes mesures 16,00 14,00 12,00 10,00 8,00 6,00 4,00 2,00 0,00 moyenne align-mus2 14,63 12,26

0,3326 Figure 11. Relations between quantitative and qualitative value: example of proximity and alignment. The function f i have been build by the analysis of the graphics.

if r align < 2 then q align = 1.5, else if r align < 7 then q align = 3 else q align = 5! q concavity = (0.02 + r concavity ) / 0.03! q distance = (2.9 + r distance ) / 3.6! if r orientation > 0.

9 Size - Expert 3 Size : Average of 3 experts Value computed R i Expert Mark Figure 10. Relations between quantitative and qualitative values: analysis of results. The analysis shown that some experts were more severe than others. We compared cases for which the marks were different and tried to analyse why. We noticed that some experts gave a bad mark when there was one exception in a regularity while others were less severe. Anyway, we managed to find functions. pseudo écart-type en mètres moyenne proximité y = 3,6524x - 2, ,506 14, ,473 13, ,536 10,685 9, ,43 7, ,627 4,4014, ,828 4,17 3,03 3,55 3,325 2,4152,841 3,132,89 2 2,211 2,004 2,169 1,4591,256 1,471,586 0,97 0 0,00 1,00 2,00 3,00 4,00 5,00 6,00 notes mesures 16,00 14,00 12,00 10,00 8,00 6,00 4,00 2,00 0,00 moyenne align-mus2 14,63 12,26 12,36 10,65 10,85 11,11 10,62 8,67 8,52 7,38 5,91 5,55 4,67 4,90 4,003,70 4,05 3,44 2,362,48 2,70 2,12 2,01 1,85 0,83 1,17 0,81 1,06 0,00 1,00 2,00 3,00 4,00 5,00 6,00 notes y = 0,7125x 2-0,6708x + 0,3326 Figure 11. Relations between quantitative and qualitative value: example of proximity and alignment. The function f i have been build by the analysis of the graphics. Bad or ambiguous cases have been removed. When it was possible a linear function has been chosen, otherwise a function linear by part has been chosen. The chosen functions are the following:! if r align < 2 then q align = 1.5, else if r align < 7 then q align = 3 else q align = 5! q concavity = ( r concavity ) / 0.03! q distance = (2.9 + r distance ) / 3.6! if r orientation > 0.35 then q orientation = 5 else q orientation = (0.1 + r orientation ) / 0.1! if r dist_size > 2.6 or average dist > 2 average size then q dist_size = 5 else q dist_size = (0.1 + r dist_size ) / 0.42! q size = (65 + r size ) / 64 Figure 12 illustrates the results on three alignments, where different properties have been computed. We took three different types of alignment to illustrate the results: if for one property our relative order between alignments is the same that the one computed by the computer and for one alignment, our relative order between properties is the same that the one computed by the computer then the results can be considered as correct. Compared with our previous research [3], results are clearly better.

Alignment 1 Alignment 2 Concavity Distance Elongation Orientation 1: excellent 5: very bad Dist / size Size Figure 12.

2 Step 2: The aggregation of the normalised values The task here is to solve the equation (5) by means of expert knowledge.

An infinite number of functions are possible that give different weights to different property such as: f = Σ λ i q i / Σ λ i, where λ i represents the importance of the property p i.

For each possible function, we computed this difference diff, and we took the function that had the smallest value of diff. The method has been tested with 32 alignments (only).

We have firstly tested constant values λ i. = [0.5; 1; 1.5; 2]) then after noticing the comments of the experts we added new weights that change with the value q i ( λ i.

10 Alignment 1 Alignment 2 Concavity Distance Elongation Orientation 1: excellent 5: very bad Dist / size Size Figure 12. Qualitative and comparable values of different properties of some alignments. 5.2 Step 2: The aggregation of the normalised values The task here is to solve the equation (5) by means of expert knowledge. We also asked experts to mark some alignments with integer values from 1 (excellent) to 10 (very bad). This time we look for a function that aggregates q i values. An infinite number of functions are possible that give different weights to different property such as: f = Σ λ i q i / Σ λ i, where λ i represents the importance of the property p i. We decided to build different possible aggregation functions and to take the one that is the closest to the values given by the expert. For each possible function, we computed this difference diff, and we took the function that had the smallest value of diff. The method has been tested with 32 alignments (only). diff = 32 j= 1 mark ( alj) 32 6 i= 1 λiqi( alj) 6 λi i= 1 (6) We created these aggregation functions automatically from the combination of predefined weight values λ i. We have firstly tested constant values λ i. = [0.5; 1; 1.5; 2]) then after noticing the comments of the experts we added new weights that change with the value q i ( λ i. = g (q i )) (see Figure 13). We have tested 12 possible weight functions (4 with constant values) for each of the 6 property, it means that we tested 12 6 configurations. For each of these possible functions, the number diff is computed. The functions are ordered according to diff. The smallest the values, the closest it is to the marks given by an expert. More weight when values are bad More weight when values are good Figure 13. Example of weight functions used to find the appropriate balance between property.

This process is performed for each expert. For each expert, the functions are ranked in a homogeneous way which is important. For all experts the shape is the less important criterion (0.

11 This process is performed for each expert. For each expert, the functions are ranked in a homogeneous way which is important. For all experts the shape is the less important criterion (0.5) whereas the size and the alignment is important. Except for the concavity (shape) and the stretching (dist / size), the preferred weights functions are the positive ones such as the blue one in Figure 13, left: A very bad regularity (value equal 4 or 5) degrades the global quality of the alignment. The chosen weight functions are the following:! Size, alignment and proximity: q [1, 3], p = 1. q [3, 5] p = ½ q ½! Orientation p = ¼ q + ¼! Concavity p = ½! Stretching (size/ dist) p = ¾ To validate the aggregation function we 1/ compared the quality computed by the new function f with the mark given by the experts, 2/ we estimated the error by cross validation [8], and 3/ we computed the quality of new alignments and see if f orders some alignments as we would do (Figure 14). quality = 1.8 good alignement quality = 8.4 very bad quality = 3.7 average Figure 14. Quality value computed by f. The cross validation gave an error of 1.7 on 10, which seams to be a lot. When analysing the mark given by the expert and the mark computed by f, the results and the order are globally correct. However, we noticed that for some cases, some experts gives very bad values while f gives medium value (~ 5): it seams that the rule such as when one important criterion is very bad, the alignment is very bad is more important then what we imagined: in such a case, even if we add specific weight function, it is not enough because the value is still mixed with others. May be the best result is that f is able to separate good from bad alignments with very few mistakes: it means that the order is nearly good and it was what we needed to detect the main alignments and to generalise them differently. 6. CONCLUSION This study done at the COGIT laboratory had the objective to find a way to define the perceptual quality of building alignments. This work was very interesting firstly because the results should allow the conception of a better typification algorithms. The automated detection and characterisation of the main alignments are now reachable. An algorithm of typification is in a conception stage. Secondly the decomposition of the process to build a function of quality f allow us to improve our measures (4) and our knowledge on what defines the quality of an alignment. Last but not least, this mixture of expert knowledge and computer computational capacity is certainly a way we should follow more and more if we want to obtain better results in GIS and more precisely in generalisation. This experiment was for us a very interesting didactic work to go towards a better introduction of expertise. 7. REFERENCES [1] M. Barrault, N. Regnauld, C. Duchêne, K. Haire, C. Baeijs, Y. Demazeau, P Hardy, W. Mackaness, A. Ruas, R. Weibel: Integrating multi-agent, object oriented and algorithmic techniques for improved automated map generalization Proc. of 20 th International Cartographic Conference Beijing, China, 2001, [2] S. Bard: Quality assessment of cartographic generalisation, Accepted, to be published in Transaction in GIS, 2003 [3] S. Christophe and A. Ruas, Detecting Building Alignments for Generalisation Purposes Advances in Spatial Handling , 2002 [4] C. Duchêne, S. Bard, X. Barillot, J. Trevisan, A. Ruas and F. Holzapfel Quantitative and Qualitative description of Building Orientation [5] J.R.. Quinlan C4.5: Programs for Machine Learning. Morgan Kaufmann, San Mateo, 1993

12 [6] R. Weibel, S. Keller and T. Reichenbacher, Overcoming the knowledge acquisition bottleneck in map generalization: the role of interactive systems and computational intelligence. COSIT, Vienna. Lecture notes in Computer Science N 988, pub Springer-Verlar (1995) [7] C. Plazanet, N. Bigolin and A. Ruas Experiments with Learning Techniques for Spatial Model Enrichment and Line Generalization GeoInformatica 2:4, (1998) [8] Mustière, S. (2001) Apprentissage supervisé pour la généralisation cartographique PhD Thesis University Pierre et Marie Curie, Paris ftp://ftp.ign.fr/ign/cogit/theses/.

13 AUTOMATIC CHARACTERISATION OF BUILDING ALIGNMENTS BY MEANS OF EXPERT KNOWLEDGE Ruas, A. and Holzapfel, F. Laboratoire COGIT, IGN. 2 Avenue Pasteur, Saint Mandé Cedex, France. Fax: anne.ruas@ign.fr Biography Anne Ruas is an IGN-France Engineer. She did a PhD on strategies of generalisation based on levels of detailed and autonomy in She also lead the technical part of the AGENT project. Since 2000, she is managing the COGIT research laboratory composed of 20 researchers working on generalisation, multiple representation, updating, access to data base and risk management. During this study in august 2002, Florenze Hopzalfel was a student in statistic. She is now a teacher in mathematics

Consistency Assessment Between Multiple Representations of Geographical Databases: a Specification-Based Approach

Consistency Assessment Between Multiple Representations of Geographical Databases: a Specification-Based Approach David Sheeren 1,2, Sébastien Mustière 1, Jean-Daniel Zucker 3 COGIT Laboratory - IGN France,