19 21 June 11th International conference DIN Deutsches Institut für Normung e. V. Modeling LMF compliant lexica in OWL-DL Malek Lhioui 1, Kais Haddar 1 and Laurent Romary 2 1 : Multimedia, InfoRmation Systems and Advanced Computing Laboratory, Sfax, Tunisia 2 : Inria, Berlin, Germany
Motivation Ensuring interoperability across lexical representation frameworks Recent report called TAUS declared: The lack of interoperability costs the translation industry a fortune Fortune is spent to adjust data formats. Ontologies contribution in building such framework Information sharing and integrating Information description: provision of a clear semantic
Data modeling and ontologies Representing and managing the semantics of data formats Going beyond the capacities of existing schema language (typing, distance and contextual checking) Representing models and instances in a coherent way Making inferences on the structures Coherence control of data sources Data completion Comparison across heterogeneous sources or formats
General architecture Ontological modeling Logical classes and constraints Logical instances Tbox (terminological box) Abox (assertional box) XML data management XML documents
The LMF case ISO 24613:2008 Language resource management - Lexical markup framework (LMF) A semasiological meta-model for the representation of lexical data Complementary to TMF for terminological data Comes with a core package and extensions Morphological descriptions, syntactic constructs, semantics Provides a non-satisfactory XML serialization Great variation in LMF usages Large lexical projects work with the TEI
Objectives Automatic generation of ontologies for standardized lexical resources No resort to manual editing Preservation of lexical properties (attributes, cardinalities, relations ) Contribution to the building up of a lexical semantic web (cf. OntoLex initiative @ W3C) Application to the Arabic language Ontology construction for Arabic are very deficient Cf. Baccar et al. Existing LMF compliant Arabic lexical editor
Main technical aspects OWL-DL as the optimal modeling framework Good compromise between expressive power and inference capabilities (OWL-Lite, OWL full) Based on a sound logical background DL: description logics (cf. Ian Horrocks) Well defined (model theoretic) semantics Formal properties well understood (complexity, decidability) Known reasoning algorithms Implemented systems (highly optimised)
MODELING LMF IN OWL-DL
Transformation Prototype Basic entries Building OWL-DL Entities Used Namespaces Basic elements LMF Header and Classes LMF SubClasses Restrictions LMF Properties LMF Relations LMF Cardinalities
Building OWL-DL Entities W3C: the set of entities is usually said to constitute the signature of an ontology Simplifying some entries in OWL modeling <!DOCTYPE rdf:rdf[ <!ENTITY xsd "http://www.w3.org/2001/xmlschema#"> <!ENTITY owl "http://www.w3.org/2002/07/owl#"> <!ENTITY rdf "http://www.w3.org/1999/02/22-rdfsyntax-ns#"> <!ENTITY rdfs "http://www.w3.org/2000/01/rdfschema#"> ]>
Used Namespaces Non ambiguous ontology Deduced directly from entities <rdf:description xmlns:owl="http://www.w3.org/2002/07/owl#" xmlns:rdf="http://www.w3.org/1999/02/22-rdfsyntax-ns#" xmlns:rdfs="http://www.w3.org/2000/01/rdf- Schema#" xmlns:xsd="http://www.w3.org/2001/xmlschema#" xmlns:lmf="http://www.lexicalmarkupframework.org# ">
LMF Header and Classes Diagram <?xml version="1.0" encoding="utf-8"?> <lexicalressource dtdversion="16"> <globalinformation> <feat att="languagecoding" val="iso 639-3"/> <feat att="scriptcoding" val="iso 15 924"/> </globalinformation> <lexicon> <feat att="language" val="arab"/> <lexicalentry morphologicalpatterns="intransitiveverb"> <feat att="partofspeech" val="verb"/> <feat att="root" val=" ( ja/la/ssa )س-ل-ج "/> <feat att="scheme" val=" </"( faala )فعل <lemma> <feat att="writtenform" val=" جلس (jalassa)"/> <feat att="writtenform" val="-"/> <feat att="type" </"( sahih )صحيح"= val </lemma> </lexicalentry> </lexicon> </lexicalressource> Exist Example: <owl:class rdf:about="&lmf;lexicon"/> No Input Verification Existence in LMF Classes Example: <owl:class /> Yes Add Class Add attribut rdf:about="&entity;in put "
LMF Classes: Automatic Generation in :Lexical Resource dtdversion="16" 1..* :Lexicon Language="arabic" 1 :Global Information languagecoding="iso 639-3" scriptcoding="iso 15 924" OWL-DL 1..* :Lexical Entry morphologicalpatterns="intransitiveverb" partofspeech="verb" ( jalassa )"س_ل_ج"= root ( faala )"فعل"= scheme 1 :Lemma "جلس"= writtenform (jalassa)»صحيح"= type (sahih) Hierarchical structure of the automatically built ontology Graphic structure of the automatic built ontology
LMF SubClasses Diagram Input: FormRepresentation <FormRepresentation> <feat att="writtenform" val="women"/> <feat att="grammaticalnumber" val="plural"/> </FormRepresentation> Output <owl:class rdf:about="&lmf;formrepresentation "> <rdfs:subclassof rdf:resource="&lmf;representation"/> </owl:class> Exist No Example: <owl:class rdf:id="&lmf;formrepresentation"/> Example: <rdfs:subclassof rdf:resource="&lmf;representation"/> Input Verification Existence in LMF SubClasses Yes Add Class + rdf:id="&entity;input " Add mother class + rdf:resource="&entity; motherclass"
LMF Properties Input <GlobalInformation> <feat att="languagecoding" val="iso 639-3"/> <feat att="scriptcoding" val="iso 15924"/> </GlobalInformation> Output <owl:datatypeproperty rdf:about="&lmf;languagecoding"> <rdfs:range rdf:resource="&xsd;string"/> <rdfs:domain rdf:resource="&lmf;globalinformation"/> </owl:datatypeproperty> <owl:datatypeproperty rdf:about="&lmf;scriptcoding"> <rdfs:range rdf:resource="&xsd;string"/> <rdfs:domain rdf:resource="&lmf;globalinformation"/> </owl:datatypeproperty> Exist No Input Verification Existence in LMF attributes Yes Add DatatypeProperty+ rdf:about="&entity;input " Add range + domain rdf:resource="&entity; range or domain"
:Lexical Resource dtdversion="16" 1..* :Lexicon Language="arabic" LMF attributes: : Automatic 1 Generation in OWL-DL :Global Information languagecoding="iso 639-3" scriptcoding="iso 15 924" 1..* :Lexical Entry morphologicalpatterns="intransitiveverb" partofspeech="verb" ( jalassa )"س_ل_ج"= root ( faala )"فعل"= scheme Data Properties 1 :Lemma "جلس"= writtenform (jalassa) صحيح"= type " (sahih)
LMF Relations Input Aggregation between Lexical Resource and Lexicon Output <owl:objectproperty rdf:about="&lmf;has_lexicon"> <rdfs:domain rdf:resource="&lmf;lexicalresource"/> <rdfs:range rdf:resource="&lmf;lexicon"/> </owl:objectproperty> Exist Input Verification Existence in LMF aggregation Yes Add ObjectProperty+ rdf:about="&entity;input " Add range + domain rdf:resource="&entity; range or domain"
LMF Relations: : Automatic Generation :Lexical Resource dtdversion="16" 1..* :Lexicon Language="arabic" 1 :Global Information languagecoding="iso 639-3" scriptcoding="iso 15 924" in OWL-DL Object Properties 1..* :Lexical Entry morphologicalpatterns="intransitiveverb" partofspeech="verb" ( jalassa )"س_ل_ج"= root ( faala )"فعل"= scheme 1 :Lemma "جلس"= writtenform (jalassa) "صحيح"= type (sahih)
LMF Cardinalities Input Cardinalities between Lexical Resource and Lexicon Output <owl:class rdf:id="lexical Resource"> <owl:restriction> <owl:onproperty rdf:resource="#has a lexica"/> <owl:mincardinality rdf:datatype="&xsd; nonnegativeinteger">1</owl:mincardinality> </owl:restriction> </owl:class> No Exist Input Verification Existence in LMF cardinalities Yes Add Restriction Add onproperty + mincardinality rdf:resource="&entity; Input" rdf:datatype="&entity; type"
INSTANTIATION EXAMPLE
LMF Instantiation Input <?xml version="1.0" encoding="utf-8"?> <lexicalressource dtdversion="16"> <globalinformation> <feat att="languagecoding" val="iso 639-3"/> <feat att="scriptcoding" val="iso 15 924"/> </globalinformation> <lexicon> <feat att="language" val="arab"/> <lexicalentry morphologicalpatterns="intransitiveverb"> <feat att="partofspeech" val="verb"/> <feat att="root" val=" ( ja/la/ssa )س-ل-ج "/> <feat att="scheme" val=" </"( faala )فعل <lemma> <feat att="writtenform" val=" جلس (jalassa)"/> <feat att="writtenform" val="-"/> <feat att="type" </"( sahih )صحيح"= val </lemma> </lexicalentry> </lexicon> </lexicalressource> No Exist Input Existence Verification instance Yes Add Individual + rdf:about="&entity;input " Add type + Data Properties + Object Properties assertions
Instantiation Example <owl:namedindividual rdf:about="lmfcoreontologies.owl#lexicon"> <rdf:type rdf:resource="lmfcoreontologies.owl#lexicon"/> <language rdf:datatype="&xsd;string">arabic</language> <has-lexicalentry rdf:resource="lmfcoreontologies.owl#kataba"/> </owl:namedindividual>
Instantiation Example (cont.) <!-- https://sites.google.com/site/aclglabo/outils/lmfontologies.owl#globalinformation --> <owl:namedindividual rdf:about="&lmfontologies;globalinformation"> <rdf:type rdf:resource="&lmfontologies;globalinformation"/> <scriptcoding rdf:datatype="&xsd;string">iso 15 924</scriptCoding> <languagecoding rdf:datatype="&xsd;string">iso 639-3</languageCoding> </owl:namedindividual> <!-- https://sites.google.com/site/aclglabo/outils/lmfontologies.owl#jalassa --> <owl:namedindividual rdf:about="&lmfontologies;jalassa"> <rdf:type rdf:resource="&lmfontologies;lexicalentry"/> <morphologicalpatterns rdf:datatype="&xsd;string">intransitiveverb</morphologicalpatterns> <partofspeech rdf:datatype="&xsd;string">verb</partofspeech> <root < root />س_ل_ج<" rdf:datatype="&xsd;string <scheme < scheme />فعل<" rdf:datatype="&xsd;string </owl:namedindividual>
:Lexical Resource dtdversion="16" 1..* :Lexicon Language="arabic" LMF Instantiation : : Automatic 1 Generation in OWL-DL :Global Information languagecoding="is O 639-3" scriptcoding="iso 15 924" Individuals 1..* :Lexical Entry morphologicalpatterns="intransitiveverb" partofspeech="verb" ( jalassa )"س_ل_ج"= root ( faala )"فعل"= scheme 1 :Lemma "جلس"= writtenform (jalassa) "صحيح"= type (sahih)
Conclusion and Perspectives Study the structure and representation of the LMF model Design an OWL-DL ontology that would be able to match its components maximally Make OWL-DL lexicons in any language easier to build Interoperable framework for the future developments modeling a family of interoperable formats