Study of the heterogeneity in cultural databases and transformation of examples from CIMI to the CIDOC CRM

Size: px
Start display at page:

Download "Study of the heterogeneity in cultural databases and transformation of examples from CIMI to the CIDOC CRM"

Transcription

1 Study of the heterogeneity in cultural databases and transformation of examples from CIMI to the CIDOC CRM Iraklis Karvasonis Institute of Computer Science, Foundation for Research and Technology Hellas Science and Technology Park of Crete P.O. Box 1385, GR , Heraklion, Crete, Greece Technical Report 291, ICS-FORTH, October 2001 Abstract: The goal of our work was to transform a small sample of different cultural databases to the CIDOC CRM [4] and study the heterogeneity in the data structure between these models. The data structure mapping between these models is also analyzed in this report. The transformations were accomplished with Data Junction [6] conversion tool and the final output format is XML, based on a DTD of the CIDOC CRM. The results of this work show that CIDOC CRM captures adequately and effectively the domain of museum data and covers semantically all the concepts, which arose from the source models we examined. Some propositions for a better data structure of the models we examined, which will help for a better transformation of them to the CIDOC CRM, are also reported. Keywords: heterogeneity, transformation, CIDOC CRM, data structure mapping, relational databases, XML 1. General goal and observations: The goal of our work was to examine the heterogeneity between various data types and the CIDOC Conceptual Reference Model and to transform all these data from a form given by CIMI members to the CIDOC CRM. CIMI [5] is a consortium of cultural heritage institutions and organizations. It has called for testing information integration models with real data from its members [1]. The data we used were in the Microsoft Access format and are databases that describe objects of 3 museums, which are members of CIMI. The "CIDOC object-oriented Conceptual Reference Model" (CRM) [4], was developed by the ICOM/CIDOC Documentation Standards Group. It represents an 'ontology' for cultural heritage information i.e. it describes in a formal language the explicit and implicit concepts and relations relevant to the documentation of cultural heritage. The primary role of the CRM is to serve as a basis for mediation of cultural heritage information and thereby provide the semantic 'glue' needed to transform today's disparate, localized information sources into a coherent and valuable global resource. We considered some principles for the design of ontologies used for knowledge sharing [8] for the better understanding of the mapping problem to the CIDOC CRM. We explored the differences between the source data types and the CIDOC CRM definitions and tried to find a way to represent the same meaning in a different way. Our goal is, by a conversion of the data, to represent as best as we can and with not loss of information the data in a form conformant with the CIDOC CRM. The data samples we have mapped and converted are from the National Museum of Denmark, the Museum of Natural History London (Clayton Herbarium) and Australian Museums On - Line. The semantics of all these samples were completely covered by the CIDOC CRM. There were however differences in the complexity and the degree of automation that could be achieved. 1

2 The conversions were accomplished with a commercial conversion tool, Data Junction 7.5 [6], which covers a large variety of conversions with a lot of capabilities. The target files are XML instances of the simplest DTD, which represents correctly the CIDOC CRM semantics and allows creating instances structurally equivalent to correct RDF instances of a full RDFS version of the CIDOC CRM. The target files can be read naturally using an xsl file making the properties visible. For every example we had to identify the sample schema-to-crm mappings and then implement and test the mappings. There is a straightforward step to wrap the whole sample in an XML instance, which takes longer for deeply nested tables and then the semantic mapping from XML to XML. During the conversions we used different kinds of mapping so that the data will be compatible with the CIDOC CRM. These kinds of mapping and the problems we faced during these conversions are explained more detailed below. 2. Kinds of Mapping: During the study and process of the examples of different kinds of databases we observed a lot of different kinds of mapping patterns, depending on the complexity and the structure of every database. Some issues about the mapping of objects to Relational databases [2] were also considered for the better understanding of the databases. For every database we worked with, first we converted the database to a one to one full representation of it to XML. After that we designed a conversion model for every database from its current representation to XML, based on the CIDOC Conceptual Reference Model and a Data Type Definition of it. So, for every field that we initially had from the database, we had either to correspond it to the appropriate element of the CIDOC CRM or to combine one or more fields and convert them so that they will be compatible with the CIDOC CRM. In all the following figures that show a kind of mapping, we have an object that a record of the database refers to, a field for this record and the value of this field. A new record for the same object is created, it is a CIDOC CRM DTD instance, has some element names and every element has a value. The element names correspond to property instances of the RDF Schema that describes the CIDOC CRM. Every value of an element belongs to a class of the same RDF Schema. In the creation of the object as a CRM instance we use some global identifiers, which are a compound of the name of the identifier that is used in the database and the value of this identifier. All these cases are examined with more details below. 1) The first situation is the simplest we found during the conversions of the data. We have an object, which is described within a database record, a field from the database for this record and a value for this field. A new object is created as a CRM instance and the field name corresponds to the appropriate element of the CIDOC CRM. The value of this element is the value of the field we corresponded to the CRM instance. A graphical representation of this kind of mapping is shown in the following figure. record field name field value A B record element name element value C Figure 1. The simplest kind of mapping D In figure 1, A stands for the object, a record of the database as a whole refers to. The link from A to B stands for a field from the database of the respective record and 'B' stands for the value of this field. 'C' stands for the same object as a CRM instance representation in which we converted 'A'. The link from C to D stands for the element name of the CRM instance and 'D' stands for the value of this element. The link from C to D in figure 1 and the analogous links in the following examples correspond to RDF property instances and the C and D values correspond to RDF class instances. The class to which C belongs depends on the interpretation of the table name, so it can be for example a Man - Made Object or a Biological Object. The class to which D belongs depends on the interpretation of 2

3 the name of the field which we are converting to the CIDOC CRM. So, if e.g. the field name is Creator then D belongs to the class Person. For example, in the data from the AMOL museum, the field ObjectID from the database was converted to the element is identified by of the CIDOC CRM. The value of the element is identified by is a class instance of the RDF Schema of the CIDOC CRM and belongs to the class E42: Object Identifier. 2) The next situation is more complicated because from a field of the database, we create two or more elements of the CIDOC CRM. This happens because the data may be complicated so that we have to cut it into sub strings and create more than one element. Sometimes this also happens because of the structure of the CIDOC CRM, which needs more depth in an element even to some simple situations. A graphical representation of this kind of mapping is shown in the following figure. A field name B C D E F Figure 2. Creation of more than one element The depth of this kind of mapping depends on the complexity and the content of the data. For example, in the AMOL database a field named "Statement" contains the data "Utopia, Northern Territory, Australia". This field was converted to the "took place at" element with data "Utopia". The next element was the "falls within" element with data "Northern Territory" and the final element was also the "falls within" element with data "Australia". So, the initial field was converted into three elements of the CIDOC CRM and the created path is the following: took place at falls within falls within. Each one of the C, D, E, F element values in figure 2 is a class instance and its value depends on the name of the field, which is converted, and the value of this field. The inverse case, from two or more fields to create only one element, was observed in situations where the Relational database uses internal identifiers to connect its tables. So, we don't need these identifiers and after the conversion we have the field we want with the correct data in only one element. This was observed in the data of the national Museum of Denmark, where after the one to one representation of the database to XML we had very big depth. During the conversion we rejected all internal identifiers and the final XML archive has less depth. 3) In this case we have a compound of more than one field for the creation of the final field. A field is an ontology which has more than one different parts. So, the compound of these parts gives us the name of the ontology, which is the value of the field we use in the conversion. Some principles for the design of ontologies were also considered in this situation [8]. A graphical representation of this kind of mapping is shown in the following figure. B A E C D Compound of the fields B, C, D F G Figure 3. Compound of more than one field 3

4 For example, in the data from the Clayton museum, the ontology Variety of an object is the compound of the fields Linnaean Genus, Linnaean Species and Linnaean Variety. The produced element is the element assigned in the CRM instance and the value of this element is the value of the ontology Variety. 4) The following case depends on the data of some fields of the databases rather than on the schema only, as in the previous cases. This means that the mapping is not based on the structure of the database only, but also on the content of its fields. So, we use some expressions and conditions during the conversions, which analogous to their values and results guide us to achieve the corresponding to the CIDOC CRM mapping. A graphical representation of this kind of mapping is shown in the following figure. E A B C D Figure 4. The created element depends on the value of some fields of the database The name of the created record, the name and the value of the created element in figure 4 depend on the E and B field values. For example, in the data from the National Museum of Denmark the table Event has a field named FieldID. Analogous to the value of this field we understand if it is an accession event or a measurement event or just a classification event. Depending on the kind of the event we have to convert this field to the corresponding element of the CRM instance, which can be the element was measured or was classified by or was produced by etc. 5) The next case of mapping, which we met through the conversion of all the databases, is the most complicated. We have two or more parallel fields for the same object which are interpreted as one path of nested elements by using the information of all the fields. The created object as a CRM instance will have a path of elements, which will result from the combination of the initial fields. A graphical representation of this kind of mapping is shown in the following figure. A B C D E F Figure 5. Combination of more than one field In figure 5, the combination of the two fields produces an object, which has two subsequent elements. For example, in the data from the Clayton Museum we have the fields State and Country that describe the place of an event. The field State is converted to the took place at element in the CRM instance and the field Country to the falls within element, which is nested to the previous element. So, the following path is created: took place at falls within. 4

5 6) In the last case we have two or more fields for a specific object within a record of the database and in the created object as a CRM instance we create an intermediate node, which contains nested in it the elements which the above fields will be converted to. A graphical representation of this kind of mapping is shown in the following figure. A B C F D E Intermediate node G Figure 6. Creation of an intermediate node For example, in the data from the Clayton Museum we have one field that describes the collector of a collection event and another that describes the date of the collection event. After the conversion, we have an object as a CRM instance and the changed ownership by element that describes an event that happened to this object. This element is the intermediate node which has nested in it the other elements. The collector of the event was converted to the transferred title to element and the date is the at most within element. So, there are created the following paths: changed ownership by transferred title to, changed ownership by at most within, going through the same intermediate node. 3. Interpretation problems: During the conversion we had some interpretation problems because the databases, which we used for the conversions, didn't have a clear explanation of how some fields and tables of them could be interpreted. So, the first problem was the ambiguous labels that some fields had, so that we couldn't understand easily in which category of the CIDOC CRM they fit better. This happens because the CIDOC CRM is a very detailed model, which covers a very big range of cases contrary to the simple fields of a database. So, for example the fields 'Person', 'Date' etc. don't explain exactly their real meaning, because they can have more than one explanation. This is a problem that complicated our conversions and in most of the cases we solved this problem by examining the data examples of these fields. Another problem were the fields that contain a lot of information, which is not separated in a standard way. For example, a field that describes the person, the place ant the date of an event should contain this information in the same way in all the records of the databases. Some of the databases we worked with, especially the AMOL database encode this information in a different way from one record to another so that the mechanical parsing of these fields becomes impossible. For example, the two records below are from the field Made from the AMOL database. The first record is Pule, Lena; Utopia Batik Program; Ingkwalalanima camp; Utopia, Northern Territory and the second is G & J Weir Ltd; Glasgow, Scotland. The first record contains a person, a name of a project and two places. The second record contains a company and a place. So, we could not identify a simple rule to extract this information to the CRM automatically in a way that will fit all the records of the database. The solution of this problem concerns the practice of substructuring data in the database. The same information should be separated in the same way in all the records of the database or different fields should be created for every piece of information. 5

6 A problem that we also met is the redundancy of the same information in a database. Some tables contain the same information in a different way. For example, in the NMD data the event for the measurement of an object is described once in general and then it is repeated with the details of the measurement. Some fields also in the same table contain some information and in the same table exists another field that combines all the previous fields. This is meaningless for the CIDOC CRM and the conversion we do, because from the last field we can retrieve all the sub information we need. For example, in the NMD database the fields PrefixCharacter, PrefixNumber, NumberPart, SuffixCharacter, SuffixNumber of the table Object are parts of the field Inventory Number of the same table. So, we need only the Inventory Number field and the other are not applicable for the conversion. Below we see some of the examples we examined, with explanations. 4. First Example: Data of the National Museum of Denmark (NMD) The source file of this conversion is a database in Microsoft Access and the graphical representation of the relations between its tables, is shown in the picture below. The target file is an XML file and is based on the CIDOC Model. For this conversion we also used a DTD for the CIDOC CRM. This database has a lot tables and some information is repeated in some tables more than once. So, after the one to one representation of the database to XML we had nesting in a very big depth, because there are used a lot of identifiers in the database for connectivity between its tables. These identifiers are not applicable to the CIDOC CRM. So, we had to discard these identifiers and design a conversion model with the appropriate mapping to represent correctly all the information to the CIDOC CRM. We had a lot of problems to achieve a correct mapping, because the data were complicated and some fields of the database were not explained well so that we would easily understand to which element of the CIDOC CRM they correspond. The mapping was also sometimes depending on the value of the data and was not the same for the same fields. This means than some fields of the database, for example these that describe some events, have more than one corresponding element to the CIDOC CRM depending on their value. So, the mapping becomes more complicated, but in general the data were converted with their entire initial meaning and content to the CIDOC CRM. The NMD data are analytical in the necessary detail to allow for complete automatic transformation. Two default assumptions not obvious from the data could be clarified with the creator and expanded to the CRM. The events of the classification and the use of an object are different in the CRM, but in the NMD data they are presented together in one event. The same happens with the events of the measurement and the acquisition. Further, as the NMD database uses dynamic types of events, a full mapping of the NMD event types to the CRM classes could have improved the mapping. The relations between the tables of the database, the mapping to the CIDOC CRM and an example of the final XML archive with the XSL we created are presented below. 6

7 4.1 The relations between the tables of the NMD database Figure 7. The relations between the tables of the NMD database 7

8 4.2 NMD to CRM mapping The NMD to CRM mapping, which is presented below, is based on the Mapping of the Dublin Core Metadata Element Set to the CIDOC CRM [7]. Table "Object": NMD [E22: Man-Made Object] NMD.ObjectId = E41: Appellation NMD->NMD.ObjectId = P1 is identified by: Appellation NMD.PrefixChacter, NMD.PrefixNumber, NMD.Year, NMD.NumberPart, NMD.SuffixCharacter, NMD.SuffixNumber, NMD.SuffixExensionCharacter, NMD.SuffixExensionNumber, NMD.PersonWhoCreatedRecord, NMD.DateWhenRecordWasCreated = not applicable NMD.InventoryNumber = E42: Object Identifier NMD->NMD.InventoryNumber = P47 is identified by: Object Identifier Table "Hierarchy": NMD.HierarchyId, NMD.ObjectId = not applicable NMD.EventId = P12 was present at or P39 was measured or P41 was classified, see NMD.Event->EventCode NMD.ParentObjectId = P46 forms part of Table "Event": NMD.EventID = see Hierarchy->HierarchyId NMD.EventCode = E55: Event Type NMD->NMD.EventCode = P2 has type: Event Type NMD.IndexRegion1Id = E41: Appellation NMD->NMD.IndexRegion1Id = P7 took place at P87 is identified by: Appellation NMD.IndexRegion2Id = E41: Appellation NMD->NMD.IndexRegion2Id = P7 took place at P87 is identified by: Appellation NMD.IndexPlaceName1 = E53: Place NMD->NMD.IndexPlaceName1 = P7 took place at: Place NMD.IndexPlaceName2 = E53: Place NMD->NMD.IndexPlaceName2 = P7 took place at: Place NMD.ActorId = not applicable NMD.CulturalPeriod = E52: Time Span NMD->NMD.CulturalPeriod = P86 falls within: Time Span NMD.StartTimePresentation = E52: Time Span NMD->NMD.StartTimePresentation = P82 at most within P79 begins at: Time Span NMD.EndTimePresentation = E52: Time Span 8

9 NMD->NMD.EndTimePresentation = P82 at most within P80 ends at: Time Span NMD.StartTime = E52: Time Span NMD->NMD.StartTime = P82 at most within P79 begins at: Time Span NMD.EndTime = E52: Time Span NMD->NMD.EndTime = P82 at most within P80 ends at: Time Span NMD.EventNote = P3 has note NMD.RecordCreationPerson, NMD.RecordCreationDate = not applicable Table "Event Actor": NMD.EventActorId, NMD.EventId = not applicable NMD.IndexActorRoleId = E55: carried out by Type, see table IndexActorRole->RoleId NMD->D.IndexActorRoleId = P14 in the role of: carried out by Type NMD.ActorId = E39: Actor, see table IndexActor->ActorId NMD->NMD.ActorId = P14 carried out by: Actor Table "Index Event Type": NMD.EventCode = E41: Appellation NMD->NMD.EventCode = P1 is identified by: Appellation NMD.EventCode1Name = E55: Event Type NMD->NMD.EventCode1Name = P2 has type: Event Type NMD.EventCode2Name = E55: Event Type NMD->NMD.EventCode2Name = P2 has type: Event Type Index "Region 1": NMD.IndexRegion1ID = E41: Appellation NMD->NMD.IndexRegion1ID = P7 took place at P87 is identified by: Appellation NMD.IndexRegion1Name = E53: Place NMD->NMD.IndexRegion1Name = P7 took place at: Place Index "Region 2": NMD.IndexRegion2ID = E41: Appellation NMD->NMD.IndexRegion2ID = P7 took place at P87 is identified by: Appellation NMD.IndexRegion2Name = E53: Place NMD->NMD.IndexRegion2Name = P7 took place at: Place 9

10 Table "Index Actor": NMD.ActorId = E41: Appellation NMD->NMD.ActorId = P1 is identified by: Appellation NMD.Initials, NMD.Acronym, NMD.RecordCreationPerson, NMD.RecordCreationDate = not applicable NMD.Title, NMD.SurNames, NMD.FirstNames, NMD.Note = P3 has note NMD.StreetAndNumber, NMD.Town, NMD.State, NMD.Country, NMD.Telephone, NMD.PostalCode = E45: Address NMD->NMD.StreetAndNumber, NMD->NMD.Town, NMD->NMD.State, NMD- >NMD.Country, NMD->NMD.Telephone, NMD->NMD.PostalCode = P76 has contact point: Address Table "Index Actor Role": NMD.IndexActorRoleId = not applicable NMD.IndexActorRoleName = E55: carried out by Type NMD->NMD.IndexActorRoleName = P14 in the role of: carried out by Type Table "Dimension": NMD.ObjectDimensionId, NMD.Condition, NMD.RecordCreationPerson, NMD.RecordCreationDate = not applicable NMD.ObjectId = E54: Dimension or E55: Man-Made Object Type NMD->NMD.ObjectId = P40 observed dimension: Dimension or P42 assigned: Man-Made Object Type NMD.EventId = E16: Measurement or E17: Assignment or E8: Acquisition NMD->NMD.EventId = P39 was measured: Measurement or P41 was classified by: Assignment Table "Object Form Material": NMD.ObjectFormMaterialID, NMD.ObjectDimensionID, NMD.Reliability, NMD.OrderOfMaterialEntry = not applicable NMD.MaterialOriginalentry = E57: Material NMD->NMD.MaterialOriginalentry = P45 consists of: Material NMD.MaterialCorrection = E57: Material NMD->NMD.MaterialCorrection = P45 consists of: Material NMD.ProductionMethode = P3 has note "Method: " NMD.Color = P3 has note "Color: " 10

11 Table "Object Form Measurement": NMD.ObjectFormMeassurmentID, NMD.ObjectDimensionId, NMD.SpecialMeassurement = not applicable NMD.IndexMeassurementTypeId = P2 has type, see table IndexmeasurementType- >TypeName NMD.IndexUnitOfMeassurementId = P91 unit, see table IndexMeasurement- >UnitTypeName NMD.Meassurement = P90 value Table "Index Measurement": NMD.IndexMeassurementUnitTypeID = not applicable NMD.IndexMeassurementUnitTypeName = P40 observed dimension P91 unit Table "Index Measurement Type": NMD.IndexMeassurementTypeID = not applicable NMD.IndexMeassurementTypeName = E55: Dimension Type NMD->NMD.IndexMeassurementTypeName = P40 observed dimension P2 has type: Dimension Type Table "Object Role Classification": NMD.ObjectRoleClassificationId, NMD.ObjectDimensionId = not applicable NMD.IndexClassification1Id = see table IndexClassification1->Name NMD.IndexClassification2Id = see table IndexClassification1->Name Table "Index Classification 1": NMD.IndexClassification1Id = not applicable NMD.IndexClassification1Name = E55: Man-Made Object Type NMD->NMD.IndexClassification1Name = P2 has type: Man-Made Object Type Table "Index Classification 2": NMD.IndexClassification2Id = not applicable NMD.IndexClassification2Name = E55: Man-Made Object Type NMD->NMD.IndexClassification2Name = P2 has type: Man-Made Object Type 11

12 Table "Object Photo": NMD.ObjectPhotoId, NMD.ObjectId = not applicable NMD.EventId = P108 was produced by NMD.PhotoNumber = E3: Document, E38: Image NMD->NMD.PhotoNumber = P70 is documented in: Document, Image Table "Index Capture Type": NMD.IndexCaptureTypeId = not applicable NMD.IndexCaptureTypeName = P70 is documented in P3 has note 4.3 An example of the final NMD archive with the XSL XML is a language that can represent a relational database [3]. So, every database we worked with was firstly transformed to a one to one representation of it to XML [10]. Some architectural issues for integrating XML and relational database systems were also considered [9]. All the final XML archives, which came out of the conversions, are like the following RDF schema: <!--Description of Epitaphios GE > <crm:e23.iconographic_object rdf:about="epitaphios_ge34604"> <crm:p19.1f.is_identified_by> <crm:e42.object_identifier rdf:about ="TA_959a"/> </crm:p19.1f.is_identified_by> <crm:p19.1f.is_identified_by> <crm:e42.object_identifier rdf:about ="GE_34604"/> </crm:p19.1f.is_identified_by> <crm:p19.3f.preferred_identifier_is rdf:resource="ge_34604"/> <crm:p1.1f.has_type> <crm:e55.type rdf:about ="ecclesiastical_embroidery"/> </crm:p1.1f.has_type> <crm:p1.1f.has_type> <crm:e55.type rdf:about ="liturgical_cloth"/> </crm:p1.1f.has_type> The element <crm:p19.1f.is_identified_by> is a property instance, the attribute rdf:about ="TA_959a" is the value of this element and the attribute crm:e42.object_identifier is the class to which this element belongs. So, in the NMD example, which is presented below, and for all the other examples we have the following correspondence. The value mandsfigur of the example below is the value of an element and corresponds to the about attribute of the above RDF schema. The value is identified by is a property instance and so it corresponds to an element of the above RDF schema. Finally, the value ( E22: Man-Made Object ) is a class instance and corresponds to the crm attribute of the above RDF schema. 12

13 13

14 14

15 15

16 5. Second Example: Data of the Clayton Museum The source file of this conversion is a database in Microsoft Access. The target file is an XML file and is based on the CIDOC Model. For this conversion we also used a DTD for the CIDOC CRM. The Clayton Herbarium sample is equally analytical as the NMD, even though it is encoded in one table. This database has only one table, which contains all the information about the objects of the museum. So, it is easier than the other databases to convert this data to the CIDOC CRM. After following some of the kinds of mapping we described above, we represented all the information to the CIDOC CRM. Even though the Clayton database is not in any normal form, e.g. assigning the same fields once again for a second event, it can be mapped without any difficulty. Some piece of information is also repeated more than once and in some occasions some data of two or more fields are contained to a more general field, which is unnecessary, because we can take this information from the other fields of the database. Some fields also had information that is useless to the CIDOC CRM or contain no information, so we didn't have to convert them. The fields of the database, the mapping to the CIDOC CRM and an example of the final XML archive with the XSL we created are presented below. 5.1 The fields of the database: RowID Update1999 LinnaeanGenus LinnaeanSpecies LinnaeanVariety ClaytonNo OldBarcode Barcode Image LTPuniqueNo SpecimenoAtBM DuplicateAtLINN Country State Collector CollectionDate FVPhraseName FloraVirginicaEdition FloraVirginicaPage Determination1Name Determination1Genus Determination1Species Determination1Author Determination1InfraRank Unique number Indicates record updated but now (in 2001) redundant Generic name (where name described by Linnaeus) Species name (where name described by Linnaeus) Varietal name (if any) (where name described by Linnaeus) Clayton collection number Old Barcode New Barcode = image filename Confirms presence of image (Yes/No) Where name described by Linnaeus this refers to a unique number for that particular name in a database belonging to the Linnaean Typification Project. Specimen actually present at BM or found at BM (Yes/No) Duplicate specimens at the Linnaean Society (Yes/No) Country of origin of specimen State in country of origin of specimen Collector of specimen Date of collection of specimen Corresponding phrase name for specimen in Flora Virginica Edition 1 or 2 of Flora Virginica Page number Any determination (genus and species) Any determination - just genus Any determination - just species Authority for determination name Rank i.e. variety or subspecies if determination made at this level Determination1InfraName Determination1InfraAuthor Determination1ByDate Name of subspecies or variety if determination made at this level Authority for subspecies or varietal name if determination made at this level Name of person who has made determination and date 16

17 Determination2Name, etc. (as above) LinnaeanAuthority LinnaeanReference LinnaeanVolume LinnaeanPage LinnaeanYear LinnaeanTypeStatus CurrentDivision CurrentFamily Genus Species CurrentSpeciesAuthor CurrentSubspecies CurrentSubspeciesAuthor CurrentVariety CurrentVarietyAuthor Comments Authority for any Linnaean name (by definition, Linnaeus) Bibliographic reference for Linnaean name (i.e. place of description) Relevant volume for any Linnaean bibliographic reference Relevant page for any Linnaean bibliographic reference Year of publication of any Linnaean name Type status of specimen in relation to any Linnaean name Division of the current name of any Linnaean name Family of current name of any Linnaean name genus of current name of any Linnaean name species of current name of any Linnaean name authority of current species name subspecies (if any) of current name of any Linnaean name authority of any current subspecies name variety (if any) of current name of any Linnaean name authority of any current varietal name Any comments or notes with regard to particular specimen 5.2 Clayton to CRM Mapping: The Clayton to CRM mapping, which is presented below, is based on the Mapping of the Dublin Core Metadata Element Set to the CIDOC CRM [7]. Clayton [E20: Biological Object] Clayton.OldBarcode = E42: Object Identifier Clayton->Clayton.OldBarcode = P47 is identified by: Object Identifier Clayton.Barcode = E42: Object Identifier Clayton->Clayton.Barcode = P48 preferred identifier is: Object Identifier Clayton.Image = E31: Document Clayton->Clayton.Image = P70 is documented in: Document Clayton [E8: Acquisition] Clayton.Collector = E39: Actor Clayton->Clayton.Collector = P24 changed ownership by P22 transferred title to: Actor Clayton.State = E53: Place Clayton->Clayton.State = P24 changed ownership by P22 transferred title to P7 took place at: Place Clayton.Country = E53: Place Clayton->Clayton.Country = P24 changed ownership by P22 transferred title to P7 took place at P89 falls within: Place Clayton.CollectionDate = E52: Time Span Clayton->Clayton.CollectionDate = P24 changed ownership by P82 at most within: Time Span Clayton.ClaytonNo = E55: Plant Species Type Clayton->Clayton.ClaytonNo = P2 has type: Plant Species Type Clayton.ClaytonNo = E41: Appellation Clayton->Clayton.ClaytonNo = P2 has type P1 is identified by: Appellation Clayton.LinnaeanGenus + Clayton.LinnaeanSpecies = E41: Appellation Clayton->Clayton.LinnaeanGenus + Clayton-> Clayton.LinnaeanSpecies = P2 has type P1 is identified by: Appellation 17

18 Clayton.LTPuniqueNo = E41: Appellation Clayton->Clayton.LTPuniqueNo = P2 has type P1 is identified by P1 is identified by: Appellation Clayton.LinnaeanReference = E32: Authority Document Clayton->Clayton.LinnaeanReference = P2 has type P1 is identified by P67 is referred to by: Authority Document Clayton->Clayton.LinnaeanVolume + Clayton->Clayton.LinnaeanPage + Clayton->Clayton.LinnaeanYear = P2 has type P1 is identified by P67 is referred to by P3 has note Assignment Clayton.LinnaeanTypeStatus = E55: Type Clayton->Clayton.LinnaeanTypeStatus = P2 has type: Type Clayton.Determination1 = E17: Type Assignment Clayton->Clayton.Determination1 = P41 was classified by: Type Clayton.Determination1By = E39: Actor Clayton->Clayton.Determination1By = P41 was classified by P14 carried out by: Actor Clayton.Determination1Date = E52: Time Span Clayton->Clayton.Determination1Date = P41 was classified by P82 at most within: Time Span Clayton.Determination1Genus = E55: Genus Type Clayton->Clayton.Determination1Genus = P41 was classified by P42 assigned: Genus Type Clayton.Determination1Species = E55: Species Type Clayton->Clayton.Determination1Species = P41 was classified by P42 assigned: Species Type Assignment Clayton.Determination2 = E17: Type Assignment Clayton->Clayton.Determination2 = P41 was classified by: Type Clayton.Determination2By = E39: Actor Clayton->Clayton.Determination2By = P41 was classified by P14 carried out by: Actor Clayton.Determination2Date = E52: Time Span Clayton->Clayton.Determination2Date = P41 was classified by P82 at most within: Time Span Clayton.Determination2Genus = E55: Genus Type Clayton->Clayton.Determination2Genus = P41 was classified by P42 assigned: Genus Type Clayton.Determination2Species = E55: Species Type Clayton->Clayton.Determination2Species = P41 was classified by P42 assigned: Species Type Clayton.FloraVirginica = E32: Authority Document Clayton->Clayton.FloraVirginica = P67 is referred to by: Authority Document Clayton->Clayton.FloraVirginicaEdition + Clayton- >Clayton.FloraVirginicaPage + Clayton->Clayton.FloraVirginicaName = P67 is referred to by P3 has note Clayton->Clayton.Comments = P3 has note 18

19 5.3 An example of the final CLAYTON archive with the XSL 19

20 20

21 6. Third Example: Data of the AMOL Museum The source file of this conversion is a database in Microsoft Access. The target file is an XML file and is based on the CIDOC Model. For this conversion we also used a DTD for the CIDOC CRM. This database has two tables. The first contains all the information about the objects it describes and the second contains the photos that correspond to the objects. The problem with this database is that all of its fields contain a lot of information, which should have been separated into more fields, because now the same information is repeated a lot of times and it is not easily parsed so that it can be used for the conversion. The records of the database are not written in a standard way, so that the information has not a good structure that can be easily processed. So, the database has fields with weak semantics like description, statement and made note. These seem to be pretty much functional as formatting means, in the tradition of museum catalogs, but cannot be used to interpret semantics. The disciplined use of some separators could have helped us more in the conversions. As the data are now, automatic interpretation needs the use of background knowledge: place name, person name, materials, organization names and object type authorities, heuristics and eventually natural language interpretation. Therefore we show here the result of a manual transformation, which demonstrates that the CRM captures completely the meaning of these data. This analysis may be useful to propose some kind of tagging scheme for the AMOL database facilitating automatic processing in a better way. The fields of the database and an example of the final XML archive with the XSL we created are presented below. 6.1 The fields of the database: ObjectID Name Statement Designed Made Date DateType Description Marks Dimensions DesignedNote MadeNote UsedNote Used OwnedExchange OwnedExchangeNote Subject Category CollDevField Unique number for the database use only Name of the object Contains complex information about the creator, the place and time of the creation Name of the designer and place of this act Contains complex information about creator and place of creation Date of the creation Type of the creation A description of the object Inscriptions on the object The dimensions of the object Note for the design of the object Note for the creation of the object Note for the use of the object Contains complex information about the person, the place and the date of the use of the object Contains complex information about the person or the company and the place of this action Note for the previous action The subject to which the object is related The category to which the object belongs More general categories to which the object belongs 21

22 6.2 An example of the final AMOL archive with the XSL: 22

23 23

24 7. Conclusions: In conclusion and after all the databases we examined and converted to the CIDOC CRM, we can say that all the data were successfully converted without loss of information, although we faced a lot of problems during the conversions. The CIDOC CRM covers a very big variety for the representation and explanation of the objects, the events and everything that refers to a museum object. Some of these databases however had problems in their structure and some fields of them were ambiguous and we didn t know exactly in which element of the CIDOC CRM they fit, because this model has a lot of detailed elements for the representation of an object. So, some of the source databases had an underspecification problem, because the CIDOC CRM has more than one element for the same value of the databases and covers a very big variety of circumstances. The CIDOC CRM captures adequately and effectively the domain of museum data and covers semantically all the concepts, which arose from the source databases we examined. So, we didn t have problems to find if the CIDOC CRM covers the concepts, but only to find the correct element that corresponds more properly to this concept. This happens because, as we described above, the CIDOC CRM has a lot of detailed elements for the representation of an object. So, the complexity of mapping is typically due to the intrinsic complexity of interpreting cultural data sources and in no means introduced by the CRM. For the decoding of the databases and their conversion we first converted the databases in a one to one representation of them to XML with the use of Data Junction conversion tool. Then we designed a conversion model, which shows how every field of the databases will be converted to the CIDOC CRM. This means that every field of every record of the database has a corresponding element to the CIDOC CRM. Finally, we used the Data Junction conversion tool for the implementation of this conversion model we designed. During this we used some different kinds of mapping, which helped us to complete the conversions. Some problems that we had during the conversions were due to the heterogeneity between the databases and the CIDOC CRM. The main heterogeneity problems were the ambiguous naming of some fields of the databases, the repetition of the same information within the same database and the not standard structure within the records of the same database. The naming of the fields of the databases may be sufficient for the database itself but it is not sufficient for the CIDOC CRM, which is a more detailed model. The repetition of the same information was carefully examined so that only the part of the information, which we needed, was converted to the corresponding elements of the CIDOC CRM. The databases should also have a standard structure within all the records of the same database because they cause parsing and mapping problems. Some databases also contain a lot of information, which is difficult to be parsed and correspond to the CIDOC CRM. For example, in the AMOL database some general fields contain a lot of information, which is not uniformly distributed and cannot be easily parsed. The solution for databases like this is the uniform distribution of the information within all the records of the database or the generic fields to be divided into more than one more specific field. The natural language analysis and the comparison of some values of the database fields with a thesaurus is also a good solution in situations where the fields are not properly labeled or it is not clear what the meaning of the information is. Then the information will be very easily mapped to the correct elements of the CIDOC CRM. So, with the AMOL data, it could be shown that the CIDOC CRM could be useful to design and introduce a moderate structuring to facilitate semantic interpretation, which is easily comprehensive by end-user documentalists. The Clayton data also show that this structuring needs in now ways be complex and deep as the CRM, nor that the end user needs to fully understand the CRM. All data samples show that the CRM instances are comprehensive, even though the presented form was not designed for presentation but to an understanding of the machine interpretable raw data themselves. After the end of the conversion, all the National Museum of Denmark and the Natural History London sample can be transformed without manual intervention. This means that every database with the same structure as the NMD and Clayton databases can be easily converted to the CRM in an automatic way. We believe that CRM instances are now ready for automatic integration. So, given persons etc. can sufficiently be identified globally. This is again a problem of the integration process and not of the 24

25 CRM. Thus, we foresee that an automatic integration could be achieved through the use of the global identifiers, which are used in the CRM instances. Finally, this test shows that a non-domain expert with usual knowledge in handling IT tools can execute the transformation with a short advise from a domain expert knowledgeable also about the CRM. This advise is once per database, and not per data, if data are sufficiently structured. This intellectual investment cannot be avoided in any intelligent data integration, which tries to preserve and to respect the intellectual qualities of our cultural heritage information. 8. References and Bibliography [1] ABC/Harmony CIMI Collaboration Project, [2] Scott Ambler: Mapping Objects to Relational Databases, October 2000 [3] Ronald Bourret, XML and Databases, [4] CIDOC Conceptual Reference Model (see [5] CIMI organization (see [6] Data Junction conversion tool (see [7] Martin Doerr: Mapping of the Dublin Core Metadata Element Set to the CIDOC CRM, July 2000 [8] Thomas Gruber: Toward Principles for the Design of Ontologies Used for Knowledge Sharing, August 1993 [9] Gerti Kappel, Elisabeth Kapsammer, Werner Retschitzegger: Architectural Issues for Integrating XML and Relational Database Systems The X-Ray Approach [10] W3C: World Wide Web Consortium, XML representation of a relational Database (see 25

Mapping Language for Information Integration

Mapping Language for Information Integration Mapping Language for Information Integration Haridimos Kondylakis 1, Martin Doerr 1, Dimitris Plexousakis 1 1 Institute of Computer Science, FORTH-ICS P.O. Box 1385, GR 71110, Heraklion, Crete, Greece

More information

It Is What It Does: The Pragmatics of Ontology for Knowledge Sharing

It Is What It Does: The Pragmatics of Ontology for Knowledge Sharing It Is What It Does: The Pragmatics of Ontology for Knowledge Sharing Tom Gruber Founder and CTO, Intraspect Software Formerly at Stanford University tomgruber.org What is this talk about? What are ontologies?

More information

CIDOC Conceptual Reference Model. Information Groups

CIDOC Conceptual Reference Model. Information Groups CIDOC Conceptual Reference Model Groups Produced by the ICOM/CIDOC Documentation Standards Group Editors: Nick Crofts, Ifigenia Dionissiadou, Martin Doerr, Pat Reed. September 1998 Version 2 March 2001

More information

Joining the BRICKS Network - A Piece of Cake

Joining the BRICKS Network - A Piece of Cake Joining the BRICKS Network - A Piece of Cake Robert Hecht and Bernhard Haslhofer 1 ARC Seibersdorf research - Research Studios Studio Digital Memory Engineering Thurngasse 8, A-1090 Wien, Austria {robert.hecht

More information

Dixit ICS-FORTH Knowledge Exchange. George Bruseker (ICS-FORTH) March 28, 2017 Crete, Greece

Dixit ICS-FORTH Knowledge Exchange. George Bruseker (ICS-FORTH) March 28, 2017 Crete, Greece Dixit ICS-FORTH Knowledge Exchange George Bruseker (ICS-FORTH) March 28, 2017 Crete, Greece Agenda 1. What is a formal ontology and what is it good for? 2. High Level Overview to CIDOC CRM 3. Discussion

More information

Opus: University of Bath Online Publication Store

Opus: University of Bath Online Publication Store Patel, M. (2004) Semantic Interoperability in Digital Library Systems. In: WP5 Forum Workshop: Semantic Interoperability in Digital Library Systems, DELOS Network of Excellence in Digital Libraries, 2004-09-16-2004-09-16,

More information

Building Consensus: An Overview of Metadata Standards Development

Building Consensus: An Overview of Metadata Standards Development Building Consensus: An Overview of Metadata Standards Development Christina Harlow DataOps Engineer, Stanford University Library cmharlow@stanford.edu, @cm_harlow Goals of this Talk 1. Give context on

More information

Table of contents for The organization of information / Arlene G. Taylor and Daniel N. Joudrey.

Table of contents for The organization of information / Arlene G. Taylor and Daniel N. Joudrey. Table of contents for The organization of information / Arlene G. Taylor and Daniel N. Joudrey. Chapter 1: Organization of Recorded Information The Need to Organize The Nature of Information Organization

More information

METADATA POLICIES FOR THE DESCRIPTION OF DIGITAL FOLKLORE COLLECTIONS

METADATA POLICIES FOR THE DESCRIPTION OF DIGITAL FOLKLORE COLLECTIONS METADATA POLICIES FOR THE DESCRIPTION OF DIGITAL FOLKLORE COLLECTIONS Irene Lourdi Libraries Computer Centre National & Kapodistrian University Campus University, Ilisia Athens Greece elourdi@lib.uoa.gr

More information

A tool for Entering Structural Metadata in Digital Libraries

A tool for Entering Structural Metadata in Digital Libraries A tool for Entering Structural Metadata in Digital Libraries Lavanya Prahallad, Indira Thammishetty, E.Veera Raghavendra, Vamshi Ambati MSIT Division, International Institute of Information Technology,

More information

Studying conceptual models for publishing library data to the Semantic Web

Studying conceptual models for publishing library data to the Semantic Web Postprint version. Please refer to the publisher s version as follows: Zapounidou S. (2017) Studying Conceptual Models for Publishing Library Data to the Semantic Web. In: Kamps J., Tsakonas G., Manolopoulos

More information

Google indexed 3,3 billion of pages. Google s index contains 8,1 billion of websites

Google indexed 3,3 billion of pages. Google s index contains 8,1 billion of websites Access IT Training 2003 Google indexed 3,3 billion of pages http://searchenginewatch.com/3071371 2005 Google s index contains 8,1 billion of websites http://blog.searchenginewatch.com/050517-075657 Estimated

More information

RDA work plan: current and future activities

RDA work plan: current and future activities RDA work plan: current and future activities Gordon Dunsire, Chair, RDA Steering Committee Presented at the IFLA satellite meeting "RDA in the wider world", Dublin, Ohio, 11 August 2016 What drives RDA

More information

Definition of the CIDOC Conceptual Reference Model

Definition of the CIDOC Conceptual Reference Model Definition of the CIDOC Conceptual Reference Model Produced by the ICOM/CIDOC Documentation Standards Group, continued by the CIDOC CRM Special Interest Group Version 5.0.2 January 2010 Editors: Nick Crofts,

More information

The Semantic Web DEFINITIONS & APPLICATIONS

The Semantic Web DEFINITIONS & APPLICATIONS The Semantic Web DEFINITIONS & APPLICATIONS Data on the Web There are more an more data on the Web Government data, health related data, general knowledge, company information, flight information, restaurants,

More information

Designing a Multi-level Metadata Standard based on Dublin Core for Museum data

Designing a Multi-level Metadata Standard based on Dublin Core for Museum data Designing a Multi-level Metadata Standard based on Dublin Core for Museum data Jing Wan Beijing University of Chemical Technology, China wanj@mail.buct.edu.cn Yubin Zhou Beijing University of Chemical

More information

Digital Library Curriculum Development Module 4-b: Metadata Draft: 6 May 2008

Digital Library Curriculum Development Module 4-b: Metadata Draft: 6 May 2008 Digital Library Curriculum Development Module 4-b: Metadata Draft: 6 May 2008 1. Module name: Metadata 2. Scope: This module addresses uses of metadata and some specific metadata standards that may be

More information

Exploring the Use of Semantic Technologies for Cross-Search of Archaeological Grey Literature and Data

Exploring the Use of Semantic Technologies for Cross-Search of Archaeological Grey Literature and Data Exploring the Use of Semantic Technologies for Cross-Search of Archaeological Grey Literature and Data Presented by Keith May @keith_may Based on the work of Andreas Vlachidis, Ceri Binding, Keith May,

More information

August 14th - 18th 2005, Oslo, Norway. Web crawling : The Bibliothèque nationale de France experience

August 14th - 18th 2005, Oslo, Norway. Web crawling : The Bibliothèque nationale de France experience World Library and Information Congress: 71th IFLA General Conference and Council "Libraries - A voyage of discovery" August 14th - 18th 2005, Oslo, Norway Conference Programme: http://www.ifla.org/iv/ifla71/programme.htm

More information

A Collaboration Model between Archival Systems to Enhance the Reliability of Preservation by an Enclose-and-Deposit Method

A Collaboration Model between Archival Systems to Enhance the Reliability of Preservation by an Enclose-and-Deposit Method A Collaboration Model between Archival Systems to Enhance the Reliability of Preservation by an Enclose-and-Deposit Method Koichi Tabata, Takeshi Okada, Mitsuharu Nagamori, Tetsuo Sakaguchi, and Shigeo

More information

Archives in a Networked Information Society: The Problem of Sustainability in the Digital Information Environment

Archives in a Networked Information Society: The Problem of Sustainability in the Digital Information Environment Archives in a Networked Information Society: The Problem of Sustainability in the Digital Information Environment Shigeo Sugimoto Research Center for Knowledge Communities Graduate School of Library, Information

More information

Waking from a Dogmatic Slumber - A Different View on Knowledge Management for DL s

Waking from a Dogmatic Slumber - A Different View on Knowledge Management for DL s Waking from a Dogmatic Slumber - A Different View on Knowledge Management for DL s DELOS NoE Brainstorming Meeting Martin Doerr Center for Cultural Informatics Institute of Computer Science Foundation

More information

FRBRoo, the IFLA Library Reference Model, and now LRMoo: a circle of development

FRBRoo, the IFLA Library Reference Model, and now LRMoo: a circle of development Submitted on: 26/06/2018 FRBRoo, the IFLA Library Reference Model, and now LRMoo: a circle of development Pat Riva Concordia University Library, Montreal, Canada. E-mail address: pat.riva@concordia.ca

More information

Computer Science Applications to Cultural Heritage. Metadata

Computer Science Applications to Cultural Heritage. Metadata Computer Science Applications to Cultural Heritage Metadata Filippo Bergamasco (filippo.bergamasco@unive.it) http://www.dais.unive.it/~bergamasco DAIS, Ca Foscari University of Venice Academic year 2017/2018

More information

Taxonomies and controlled vocabularies best practices for metadata

Taxonomies and controlled vocabularies best practices for metadata Original Article Taxonomies and controlled vocabularies best practices for metadata Heather Hedden is the taxonomy manager at First Wind Energy LLC. Previously, she was a taxonomy consultant with Earley

More information

Definition of the CIDOC Conceptual Reference Model

Definition of the CIDOC Conceptual Reference Model Definition of the CIDOC Conceptual Reference Model Produced by the ICOM/CIDOC Documentation Standards Group, continued by the CIDOC CRM Special Interest Group Version 4.0 April 2004 Editors: Nick Crofts,

More information

Guidelines for Developing Digital Cultural Collections

Guidelines for Developing Digital Cultural Collections Guidelines for Developing Digital Cultural Collections Eirini Lourdi Mara Nikolaidou Libraries Computer Centre, University of Athens Harokopio University of Athens Panepistimiopolis, Ilisia, 15784 70 El.

More information

Whole-Part relations and Event Inheritance in CIDOC-CRM

Whole-Part relations and Event Inheritance in CIDOC-CRM Whole-Part relations and Event Inheritance in CIDOC-CRM Presented by Ari Häyrinen PhD Student in Digital Culture Developing a CIDOC-CRM -based tool for cultural historical documentation opendimension.org/ida

More information

Content Management for the Defense Intelligence Enterprise

Content Management for the Defense Intelligence Enterprise Gilbane Beacon Guidance on Content Strategies, Practices and Technologies Content Management for the Defense Intelligence Enterprise How XML and the Digital Production Process Transform Information Sharing

More information

XETA: extensible metadata System

XETA: extensible metadata System XETA: extensible metadata System Abstract: This paper presents an extensible metadata system (XETA System) which makes it possible for the user to organize and extend the structure of metadata. We discuss

More information

Sharing Data on the Aquileia Heritage: Proposals for a Research Project

Sharing Data on the Aquileia Heritage: Proposals for a Research Project D-1 Sharing Data on the Aquileia Heritage: Proposals for a Research Project Vito Roberto and Paolo Omero Department of Informatics, University of Udine, Italy vito.roberto@uniud.it, paolo.omero@uniud.it

More information

Museum Collections and the Semantic Web

Museum Collections and the Semantic Web Museum Collections and the Semantic Web Maria Nisheva-Pavlova 1,2, Nicolas Spyratos 3, Peter Stanchev 2,4 1 Faculty of Mathematics and Informatics, Sofia University, Bulgaria 2 Institute of Mathematics

More information

For those of you who may not have heard of the BHL let me give you some background. The Biodiversity Heritage Library (BHL) is a consortium of

For those of you who may not have heard of the BHL let me give you some background. The Biodiversity Heritage Library (BHL) is a consortium of 1 2 For those of you who may not have heard of the BHL let me give you some background. The Biodiversity Heritage Library (BHL) is a consortium of natural history and botanical libraries that cooperate

More information

Draft for discussion, by Karen Coyle, Diane Hillmann, Jonathan Rochkind, Paul Weiss

Draft for discussion, by Karen Coyle, Diane Hillmann, Jonathan Rochkind, Paul Weiss Framework for a Bibliographic Future Draft for discussion, by Karen Coyle, Diane Hillmann, Jonathan Rochkind, Paul Weiss Introduction Metadata is a generic term for the data that we create about persons,

More information

Using the Semantic Web in Ubiquitous and Mobile Computing

Using the Semantic Web in Ubiquitous and Mobile Computing Using the Semantic Web in Ubiquitous and Mobile Computing Ora Lassila Research Fellow, Software & Applications Laboratory, Nokia Research Center Elected Member of Advisory Board, World Wide Web Consortium

More information

Digitisation Standards

Digitisation Standards Digitisation Standards Jeannette Frey, Alexandre Lopes BCU Lausanne LIBER-EBLIDA Digitization Workshop 2011 The Hague, October 5-7, 2011 Standards which standards? Standards are set by experts (not by

More information

7.3. In t r o d u c t i o n to m e t a d a t a

7.3. In t r o d u c t i o n to m e t a d a t a 7. Standards for Data Documentation 7.1. Introducing standards for data documentation Data documentation has several functions. It records where images, text, and other forms of digital data are from,

More information

Linked.Art & Vocabularies: Linked Open Usable Data

Linked.Art & Vocabularies: Linked Open Usable Data Linked.Art & : Linked Open Usable Data Rob Sanderson, David Newbury Semantic Architect, Software & Data Architect J. Paul Getty Trust rsanderson, dnewbury, RDF & Linked Data & Ontologies & What is RDF?

More information

GUID Guide for Data Providers

GUID Guide for Data Providers GUID Guide for Data Providers 2013-06- 26 Preface A Globally Unique Identifier (GUID) is a unique reference number used as an identifier. Complexities associated with specimens and associated, dynamic

More information

Development of Contents Management System Based on Light-Weight Ontology

Development of Contents Management System Based on Light-Weight Ontology Development of Contents Management System Based on Light-Weight Ontology Kouji Kozaki, Yoshinobu Kitamura, and Riichiro Mizoguchi Abstract In the Structuring Nanotechnology Knowledge project, a material-independent

More information

A tutorial report for SENG Agent Based Software Engineering. Course Instructor: Dr. Behrouz H. Far. XML Tutorial.

A tutorial report for SENG Agent Based Software Engineering. Course Instructor: Dr. Behrouz H. Far. XML Tutorial. A tutorial report for SENG 609.22 Agent Based Software Engineering Course Instructor: Dr. Behrouz H. Far XML Tutorial Yanan Zhang Department of Electrical and Computer Engineering University of Calgary

More information

Using an ontology for interoperability and browsing of museum, library and archive information

Using an ontology for interoperability and browsing of museum, library and archive information VOL I Documentation 1 Abstract Ontologies play an important part in the development of the future semantic web ; the CIDOC conceptual reference model (CRM) is an ontology aimed at the cultural heritage

More information

Definition of the CIDOC Conceptual Reference Model

Definition of the CIDOC Conceptual Reference Model Definition of the CIDOC Conceptual Reference Model Produced by the ICOM/CIDOC Documentation Standards Group, Continued by the CIDOC CRM Special Interest Group Document Type: Current Editorial Status: In

More information

Towards the Semantic Web

Towards the Semantic Web Towards the Semantic Web Ora Lassila Research Fellow, Nokia Research Center (Boston) Chief Scientist, Nokia Venture Partners LLP Advisory Board Member, W3C XML Finland, October 2002 1 NOKIA 10/27/02 -

More information

OXLOD Pilot Oxford Linked Data. 4 October OeRC

OXLOD Pilot Oxford Linked Data. 4 October OeRC OXLOD Pilot Oxford Linked Data 4 October 2018 - OeRC Background What did we set out to achieve and why is this important? What have we delivered? Purpose of today's session Pilot findings (Gabriel Hanganu)

More information

Army Data Services Layer (ADSL) Data Mediation Providing Data Interoperability and Understanding in a

Army Data Services Layer (ADSL) Data Mediation Providing Data Interoperability and Understanding in a Army Data Services Layer (ADSL) Data Mediation Providing Data Interoperability and Understanding in a SOA Environment Michelle Dirner Army Net-Centric t Data Strategy t (ANCDS) Center of Excellence (CoE)

More information

A Study on Language Design of Creating Compound Metadata Schama Method Based on Museum Information

A Study on Language Design of Creating Compound Metadata Schama Method Based on Museum Information Proceedings of the 5th WSEAS International Conference on E-ACTIVITIES, Venice, Italy, November 20-22, 2006 121 A Study on Language Design of Creating Compound Metadata Schama Method Based on Museum Information

More information

Fausto Giunchiglia and Mattia Fumagalli

Fausto Giunchiglia and Mattia Fumagalli DISI - Via Sommarive 5-38123 Povo - Trento (Italy) http://disi.unitn.it FROM ER MODELS TO THE ENTITY MODEL Fausto Giunchiglia and Mattia Fumagalli Date (2014-October) Technical Report # DISI-14-014 From

More information

Formulating XML-IR Queries

Formulating XML-IR Queries Alan Woodley Faculty of Information Technology, Queensland University of Technology PO Box 2434. Brisbane Q 4001, Australia ap.woodley@student.qut.edu.au Abstract: XML information retrieval systems differ

More information

BIBLIOGRAPHIC REFERENCE DATA STANDARD

BIBLIOGRAPHIC REFERENCE DATA STANDARD BIBLIOGRPHIC REFERENCE DT STNDRD Standard No.: EX000007.1 January 6, 2006 This standard has been produced through the Environmental Data Standards Council (EDSC). The Environmental Data Standards Council

More information

Features and Requirements for an XML View Definition Language: Lessons from XML Information Mediation

Features and Requirements for an XML View Definition Language: Lessons from XML Information Mediation Page 1 of 5 Features and Requirements for an XML View Definition Language: Lessons from XML Information Mediation 1. Introduction C. Baru, B. Ludäscher, Y. Papakonstantinou, P. Velikhov, V. Vianu XML indicates

More information

6.001 Notes: Section 8.1

6.001 Notes: Section 8.1 6.001 Notes: Section 8.1 Slide 8.1.1 In this lecture we are going to introduce a new data type, specifically to deal with symbols. This may sound a bit odd, but if you step back, you may realize that everything

More information

Definition of the CIDOC Conceptual Reference Model

Definition of the CIDOC Conceptual Reference Model Definition of the CIDOC Conceptual Reference Model Produced by the ICOM/CIDOC Documentation Standards Group, Continued by the CIDOC CRM Special Interest Group Version 6.0 January 2015 Current Main Editors:

More information

ESSCS Annual 30 Aug 03. Overview - VRR. Design for Intelligent Access to Cultural Heritage Information. A reference room. I-Mass project.

ESSCS Annual 30 Aug 03. Overview - VRR. Design for Intelligent Access to Cultural Heritage Information. A reference room. I-Mass project. ESSCS Annual 30 Aug 03 Overview - VRR Design for Intelligent Access to Cultural Heritage Information Geert de Haan Maastricht McLuhan Institute Maastricht The Netherlands g.dehaan@mmi.unimaas.nl

More information

21. Document Component Design

21. Document Component Design Page 1 of 17 1. Plan for Today's Lecture Methods for identifying aggregate components 21. Document Component Design Bob Glushko (glushko@sims.berkeley.edu) Document Engineering (IS 243) - 11 April 2005

More information

Information Technology Document Schema Definition Languages (DSDL) Part 1: Overview

Information Technology Document Schema Definition Languages (DSDL) Part 1: Overview ISO/IEC JTC 1/SC 34 Date: 2008-09-17 ISO/IEC FCD 19757-1 ISO/IEC JTC 1/SC 34/WG 1 Secretariat: Japanese Industrial Standards Committee Information Technology Document Schema Definition Languages (DSDL)

More information

Interactive Machine Learning (IML) Markup of OCR Generated Text by Exploiting Domain Knowledge: A Biodiversity Case Study

Interactive Machine Learning (IML) Markup of OCR Generated Text by Exploiting Domain Knowledge: A Biodiversity Case Study Interactive Machine Learning (IML) Markup of OCR Generated by Exploiting Domain Knowledge: A Biodiversity Case Study Several digitization projects such as Google books are involved in scanning millions

More information

EDEN An Epigraphic Web Database of Ancient Inscriptions

EDEN An Epigraphic Web Database of Ancient Inscriptions EDEN An Epigraphic Web Database of Ancient Inscriptions Martin Scholz (FAU Erlangen-Nürnberg) 21.04.2016 Outline Goals, Content, and Structure of EDEN Online Database Semantic Modelling Annotating Text

More information

Metadata Requirements for Digital Museum Environments

Metadata Requirements for Digital Museum Environments Metadata Requirements for Digital Museum Environments Manjula Patel UKOLN, University of Bath m.patel@ukoln.ac.uk Unless otherwise stated this work is licensed under a Creative Commons Attribution-ShareAlike

More information

From Open Data to Data- Intensive Science through CERIF

From Open Data to Data- Intensive Science through CERIF From Open Data to Data- Intensive Science through CERIF Keith G Jeffery a, Anne Asserson b, Nikos Houssos c, Valerie Brasse d, Brigitte Jörg e a Keith G Jeffery Consultants, Shrivenham, SN6 8AH, U, b University

More information

Data-Intensive Workflows A journey to a Holistic Framework for Data-Intensive Workflows

Data-Intensive Workflows A journey to a Holistic Framework for Data-Intensive Workflows Data-Intensive Workflows A journey to a Holistic Framework for Data-Intensive Workflows Ian Corner Design and Implementation Lead May 2016 INFORMATION MANAGEMENT AND TECHNOLOGY (IMT) CSIRO Who we are Commonwealth

More information

Domain-specific Concept-based Information Retrieval System

Domain-specific Concept-based Information Retrieval System Domain-specific Concept-based Information Retrieval System L. Shen 1, Y. K. Lim 1, H. T. Loh 2 1 Design Technology Institute Ltd, National University of Singapore, Singapore 2 Department of Mechanical

More information

Metadata Common Vocabulary: a journey from a glossary to an ontology of statistical metadata, and back

Metadata Common Vocabulary: a journey from a glossary to an ontology of statistical metadata, and back Joint UNECE/Eurostat/OECD Work Session on Statistical Metadata (METIS) Lisbon, 11 13 March, 2009 Metadata Common Vocabulary: a journey from a glossary to an ontology of statistical metadata, and back Sérgio

More information

ARKive-ERA Project Lessons and Thoughts

ARKive-ERA Project Lessons and Thoughts ARKive-ERA Project Lessons and Thoughts Semantic Web for Scientific and Cultural Organisations Convitto della Calza 17 th June 2003 Paul Shabajee (ILRT, University of Bristol) 1 Contents Context Digitisation

More information

Envisioning Semantic Web Technology Solutions for the Arts

Envisioning Semantic Web Technology Solutions for the Arts Information Integration Intelligence Solutions Envisioning Semantic Web Technology Solutions for the Arts Semantic Web and CIDOC CRM Workshop Ralph Hodgson, CTO, TopQuadrant National Museum of the American

More information

INTRODUCING THE UNIFIED E-BOOK FORMAT AND A HYBRID LIBRARY 2.0 APPLICATION MODEL BASED ON IT. 1. Introduction

INTRODUCING THE UNIFIED E-BOOK FORMAT AND A HYBRID LIBRARY 2.0 APPLICATION MODEL BASED ON IT. 1. Introduction Преглед НЦД 14 (2009), 43 52 Teo Eterović, Nedim Šrndić INTRODUCING THE UNIFIED E-BOOK FORMAT AND A HYBRID LIBRARY 2.0 APPLICATION MODEL BASED ON IT Abstract: We introduce Unified e-book Format (UeBF)

More information

Abstract The CIDOC Conceptual Reference Model (CRM) is regarded as an interoperability solution for integrating heterogeneous metadata in the

Abstract The CIDOC Conceptual Reference Model (CRM) is regarded as an interoperability solution for integrating heterogeneous metadata in the Abstract The CIDOC Conceptual Reference Model (CRM) is regarded as an interoperability solution for integrating heterogeneous metadata in the cultural heritage domain. The major problem developers are

More information

The Dublin Core Metadata Element Set

The Dublin Core Metadata Element Set ISSN: 1041-5635 The Dublin Core Metadata Element Set Abstract: Defines fifteen metadata elements for resource description in a crossdisciplinary information environment. A proposed American National Standard

More information

Ontology - based Semantic Value Conversion

Ontology - based Semantic Value Conversion International Journal of Computer Techniques Volume 4 Issue 5, September October 2017 RESEARCH ARTICLE Ontology - based Semantic Value Conversion JieWang 1 1 (School of Computer Science, Jinan University,

More information

CULTURAL DOCUMENTATION: THE CLIO SYSTEM. Panos Constantopoulos. University of Crete and Foundation of Research and Technology - Hellas

CULTURAL DOCUMENTATION: THE CLIO SYSTEM. Panos Constantopoulos. University of Crete and Foundation of Research and Technology - Hellas CULTURAL DOCUMENTATION: THE CLIO SYSTEM Panos Constantopoulos University of Crete and Foundation of Research and Technology - Hellas Institute of Computer Science Foundation of Research and Technology

More information

Categorizing Migrations

Categorizing Migrations What to Migrate? Categorizing Migrations A version control repository contains two distinct types of data. The first type of data is the actual content of the directories and files themselves which are

More information

1 Introduction to Networking

1 Introduction to Networking 1 Introduction to Networking 1.1 What are networks? That seems like an appropriate question to start with. Pretty much anything that s connected to anything else in some way can be described as a network.

More information

ICOM/CIDOC DATA MODEL WORKING GROUP. Graphic Data Model

ICOM/CIDOC DATA MODEL WORKING GROUP. Graphic Data Model ICOM/CIDOC DATA MODEL WORKING GROUP Graphic Data Model This graphic model covers the big four entities: objects, events, people, and roles It gives you an alternative way to access information about the

More information

Teiid Designer User Guide 7.5.0

Teiid Designer User Guide 7.5.0 Teiid Designer User Guide 1 7.5.0 1. Introduction... 1 1.1. What is Teiid Designer?... 1 1.2. Why Use Teiid Designer?... 2 1.3. Metadata Overview... 2 1.3.1. What is Metadata... 2 1.3.2. Editing Metadata

More information

Monitoring and Reporting Drafting Team Monitoring Indicators Justification Document

Monitoring and Reporting Drafting Team Monitoring Indicators Justification Document INSPIRE Infrastructure for Spatial Information in Europe Monitoring and Reporting Drafting Team Monitoring Indicators Justification Document Title Draft INSPIRE Monitoring Indicators Justification Document

More information

Component-Based Software Engineering TIP

Component-Based Software Engineering TIP Component-Based Software Engineering TIP X LIU, School of Computing, Napier University This chapter will present a complete picture of how to develop software systems with components and system integration.

More information

Metadata Workshop 3 March 2006 Part 1

Metadata Workshop 3 March 2006 Part 1 Metadata Workshop 3 March 2006 Part 1 Metadata overview and guidelines Amelia Breytenbach Ria Groenewald What metadata is Overview Types of metadata and their importance How metadata is stored, what metadata

More information

Metadata Management System (MMS)

Metadata Management System (MMS) Metadata Management System (MMS) Norhaizan Mat Talha MIMOS Berhad, Technology Park, Kuala Lumpur, Malaysia Mail:zan@mimos.my Abstract: Much have been said about metadata which is data about data used for

More information

PRINCIPLES AND FUNCTIONAL REQUIREMENTS

PRINCIPLES AND FUNCTIONAL REQUIREMENTS INTERNATIONAL COUNCIL ON ARCHIVES PRINCIPLES AND FUNCTIONAL REQUIREMENTS FOR RECORDS IN ELECTRONIC OFFICE ENVIRONMENTS RECORDKEEPING REQUIREMENTS FOR BUSINESS SYSTEMS THAT DO NOT MANAGE RECORDS OCTOBER

More information

The CEN Metalex Naming Convention

The CEN Metalex Naming Convention The CEN Metalex Naming Convention Fabio Vitali University of Bologna CEN Metalex CEN Metalex has been an international effort to create an interchange format between national XML formats for legislation.

More information

Metadata Issues in Long-term Management of Data and Metadata

Metadata Issues in Long-term Management of Data and Metadata Issues in Long-term Management of Data and S. Sugimoto Faculty of Library, Information and Media Science, University of Tsukuba Japan sugimoto@slis.tsukuba.ac.jp C.Q. Li Graduate School of Library, Information

More information

Infrastructure for Multilayer Interoperability to Encourage Use of Heterogeneous Data and Information Sharing between Government Systems

Infrastructure for Multilayer Interoperability to Encourage Use of Heterogeneous Data and Information Sharing between Government Systems Hitachi Review Vol. 65 (2016), No. 1 729 Featured Articles Infrastructure for Multilayer Interoperability to Encourage Use of Heterogeneous Data and Information Sharing between Government Systems Kazuki

More information

Annotation Science From Theory to Practice and Use Introduction A bit of history

Annotation Science From Theory to Practice and Use Introduction A bit of history Annotation Science From Theory to Practice and Use Nancy Ide Department of Computer Science Vassar College Poughkeepsie, New York 12604 USA ide@cs.vassar.edu Introduction Linguistically-annotated corpora

More information

Overview of Web Mining Techniques and its Application towards Web

Overview of Web Mining Techniques and its Application towards Web Overview of Web Mining Techniques and its Application towards Web *Prof.Pooja Mehta Abstract The World Wide Web (WWW) acts as an interactive and popular way to transfer information. Due to the enormous

More information

Generalized Document Data Model for Integrating Autonomous Applications

Generalized Document Data Model for Integrating Autonomous Applications 6 th International Conference on Applied Informatics Eger, Hungary, January 27 31, 2004. Generalized Document Data Model for Integrating Autonomous Applications Zsolt Hernáth, Zoltán Vincellér Abstract

More information

Two interrelated objectives of the ARIADNE project, are the. Training for Innovation: Data and Multimedia Visualization

Two interrelated objectives of the ARIADNE project, are the. Training for Innovation: Data and Multimedia Visualization Training for Innovation: Data and Multimedia Visualization Matteo Dellepiane and Roberto Scopigno CNR-ISTI Two interrelated objectives of the ARIADNE project, are the design of new services (or the integration

More information

Automatic Interpretation of Natural Language for a Multimedia E-learning Tool

Automatic Interpretation of Natural Language for a Multimedia E-learning Tool Automatic Interpretation of Natural Language for a Multimedia E-learning Tool Serge Linckels and Christoph Meinel Department for Theoretical Computer Science and New Applications, University of Trier {linckels,

More information

The CASPAR Finding Aids

The CASPAR Finding Aids ABSTRACT The CASPAR Finding Aids Henri Avancini, Carlo Meghini, Loredana Versienti CNR-ISTI Area dell Ricerca di Pisa, Via G. Moruzzi 1, 56124 Pisa, Italy EMail: Full.Name@isti.cnr.it CASPAR is a EU co-funded

More information

Data Preprocessing. Slides by: Shree Jaswal

Data Preprocessing. Slides by: Shree Jaswal Data Preprocessing Slides by: Shree Jaswal Topics to be covered Why Preprocessing? Data Cleaning; Data Integration; Data Reduction: Attribute subset selection, Histograms, Clustering and Sampling; Data

More information

Linked Data and cultural heritage data: an overview of the approaches from Europeana and The European Library

Linked Data and cultural heritage data: an overview of the approaches from Europeana and The European Library Linked Data and cultural heritage data: an overview of the approaches from Europeana and The European Library Nuno Freire Chief data officer The European Library Pacific Neighbourhood Consortium 2014 Annual

More information

RiMOM Results for OAEI 2009

RiMOM Results for OAEI 2009 RiMOM Results for OAEI 2009 Xiao Zhang, Qian Zhong, Feng Shi, Juanzi Li and Jie Tang Department of Computer Science and Technology, Tsinghua University, Beijing, China zhangxiao,zhongqian,shifeng,ljz,tangjie@keg.cs.tsinghua.edu.cn

More information

Progress report for revising and harmonising ICA descriptive standards

Progress report for revising and harmonising ICA descriptive standards Committee on Best Practices and Standards / Sub-committee on archival description Comité des normes et des bonnes pratiques / Sous-comité des normes de description Progress report for revising and harmonising

More information

Metadata in the Driver's Seat: The Nokia Metia Framework

Metadata in the Driver's Seat: The Nokia Metia Framework Metadata in the Driver's Seat: The Nokia Metia Framework Abstract Patrick Stickler The Metia Framework defines a set of standard, open and portable models, interfaces, and

More information

Global estandards and Web Architectures for egovernment projects José M. Alonso,

Global estandards and Web Architectures for egovernment projects José M. Alonso, Global estandards and Web Architectures for egovernment projects José M. Alonso, egovernment and W3C José M. Alonso CTIC Fellow - W3C egovernment Lead Technology and Society Domain 28 May

More information

Metadata Standards & Applications. 7. Approaches to Models of Metadata Creation, Storage, and Retrieval

Metadata Standards & Applications. 7. Approaches to Models of Metadata Creation, Storage, and Retrieval Metadata Standards & Applications 7. Approaches to Models of Metadata Creation, Storage, and Retrieval Goals for Session Understand the differences between traditional vs. digital library Metadata creation

More information

Software Quality. Chapter What is Quality?

Software Quality. Chapter What is Quality? Chapter 1 Software Quality 1.1 What is Quality? The purpose of software quality analysis, or software quality engineering, is to produce acceptable products at acceptable cost, where cost includes calendar

More information

University of Bath. Publication date: Document Version Publisher's PDF, also known as Version of record. Link to publication

University of Bath. Publication date: Document Version Publisher's PDF, also known as Version of record. Link to publication Citation for published version: Patel, M & Duke, M 2004, 'Knowledge Discovery in an Agents Environment' Paper presented at European Semantic Web Symposium 2004, Heraklion, Crete, UK United Kingdom, 9/05/04-11/05/04,.

More information

An aggregation system for cultural heritage content

An aggregation system for cultural heritage content An aggregation system for cultural heritage content Nasos Drosopoulos, Vassilis Tzouvaras, Nikolaos Simou, Anna Christaki, Arne Stabenau, Kostas Pardalis, Fotis Xenikoudakis, Eleni Tsalapati and Stefanos

More information

Introduction to XML. XML: basic elements

Introduction to XML. XML: basic elements Introduction to XML XML: basic elements XML Trying to wrap your brain around XML is sort of like trying to put an octopus in a bottle. Every time you think you have it under control, a new tentacle shows

More information

Who we are: Kristin Martin, Metadata Librarian, Catalog Department Peter Hepburn, Digitization Librarian, Digital Programs Department

Who we are: Kristin Martin, Metadata Librarian, Catalog Department Peter Hepburn, Digitization Librarian, Digital Programs Department Introduction Who we are: Kristin Martin, Metadata Librarian, Catalog Department Peter Hepburn, Digitization Librarian, Digital Programs Department Many of the images in this presentation come from the

More information