Syntactic Tree-based Relation Extraction Using a Generalization of Collins and Duffy Convolution Tree Kernel

Syntactc Tree-based Relaton Extracton Usng a Generalzaton of Collns and Duffy Convoluton Tree Kernel Mahdy Khayyaman Seyed Abolghasem Hassan Abolhassan Mrroshandel Sharf Unversty of Technology Sharf Unversty of Technology Sharf Unversty of Technology khayyaman@ce.sharf.edu mrroshandel@ce.sharf.edu abolhassan@sharf.edu Abstract Relaton extracton s a challengng task n natural language processng. Syntactc features are recently shown to be qute effectve for relaton extracton. In ths paper, we generalze state of art syntactc convoluton tree kernel ntroduced by Collns and Duffy. The proposed generalzed kernel s more flexble and customzable, and can be convenently utlzed for systematc generaton of more effectve applcaton specfc syntactc sub-kernels. Usng generalzed kernel, we wll also propose a number of novel syntactc sub-kernels for relaton extracton. These kernels show a remarkable performance mprovement over orgnal Collns and Duffy kernel n extracton of ACE-005 relaton types. Introducton One of contemporary demandng NLP tasks s nformaton extracton, whch s procedure of extractng structured nformaton such as enttes, relatons, and events from free text documents. As an nformaton extracton sub-task, semantc relaton extracton s procedure of fndng predefned semantc relatons between textual entty mentons. For nstance, assumng a semantc relaton wth type Physcal and subtype Located between an entty of type Person and anor entty of type Locaton, sentence "Polce arrested Mark at arport last week." conveys two mentons of ths relaton between "Mark" and "arport" and also between "polce" and "arport" that can be shown n followng format. Phys.LocatedMark, arport) Phys.Locatedpolce, arport) Relaton extracton s a key step towards queston answerng systems by whch vtal structured data s acqured from underlyng free text resources. Detecton of proten nteractons n bomedcal corpora L et al., 008) s anor valuable applcaton of relaton extracton. Relaton extracton can be approached by a standard classfcaton learnng method. We partcularly use SVM Boser et al., 99; Cortes and Vapnk, 995) and kernel functons as our classfcaton method. A kernel s a functon that calculates nner product of two transformed vectors of a hgh dmensonal feature space usng orgnal feature vectors as shown n eq.. K X, X j ) φ X ). φ X j ) ) Kernel functons can mplctly capture a large amount of features effcently; thus, y have been wdely used n varous NLP tasks. Varous types of features have been exploted so far for relaton extracton. In Bunescu and Mooney, 005b) sequence of words features are utlzed usng a sub-sequence kernel. In Bunescu and Mooney, 005a) dependency graph features are exploted, and n Zhang et al., 006a) syntactc features are employed for relaton extracton. Although n order to acheve best performance, t s necessary to use a proper combnaton of se features Zhou et al., 005), n ths paper, we wll concentrate on how to better capture syntactc features for relaton extracton. 66 Proceedngs of NAACL HLT Student Research Workshop and Doctoral Consortum, pages 66 7, Boulder, Colorado, June 009. c 009 Assocaton for Computatonal Lngustcs

In CD 0 Collns and Duffy, 00) a convoluton syntactc tree kernel s proposed that generally measures syntactc smlarty between parse trees. In ths paper, a generalzed verson of CD 0 convoluton tree kernel s proposed by assocatng generc weghts to nodes and sub-trees of parse tree. These weghts can be used to ncorporate doman knowledge nto kernel and make t more flexble and customzable. The generalzed kernel can be convenently used to generate a varety of syntactc sub-kernels ncludng orgnal CD 0 kernel), by adoptng approprate weghtng mechansms. As a result, n ths paper, novel syntactc subkernels are generated from generalzed kernel for task of relaton extracton. Evaluatons demonstrate that se kernels outperform orgnal CD 0 kernel n extracton of ACE- 005 man relaton types The remander of ths paper s structured as follows. In secton, most related works are brefly revewed. In secton 3, CD 0 tree kernel s descrbed. The proposed generalzed convoluton tree kernel s explaned n secton 4 and ts produced sub-kernels for relaton extracton are llustrated n secton 5. The expermental results are dscussed n secton 6. Our work s concluded n secton 7 and some possble future works are presented n secton 8. Related Work In Collns and Duffy, 00), a convoluton parse tree kernel has been ntroduced. Ths kernel s generally desgned to measure syntactc smlarty between parse trees and s especally exploted for parsng Englsh sentences n r paper. Snce n, kernel has been wdely used n dfferent applcatons such as semantc role labelng Moschtt, 006b) and relaton extracton Zhang et al., 006a; Zhang et al., 006b; Zhou et al., 007; L et al. 008). For frst tme, n Zhang et al., 006a), ths convoluton tree kernel was used for relaton extracton. Snce whole syntactc parse tree of sentence that holds relaton arguments contans a plenty of msleadng features, several parse tree portons are studed to fnd most feature-rch porton of syntactc tree for relaton extracton, and Path-Enclosed Tree PT) s fnally found to be best performng tree porton. PT s a porton of parse tree that s enclosed by shortest path between two relaton arguments. Moreover, ths tree kernel s combned wth an entty kernel to form a reportedly hgh qualty composte kernel n Zhang et al., 006b). 3 CD 0 Convoluton Tree Kernel In Collns and Duffy, 00), a convoluton tree kernel has been ntroduced that measures syntactc smlarty between parse trees. Ths kernel computes nner products of followng feature vector. H T ) λ λ 0 < λ szen sze # subtree T ),..., λ # subtree T )) n sze # subtree T ),..., ) Each feature of ths vector s occurrence count of a sub-tree type n parse tree decayed exponentally by parameter λ. Wthout ths decayng mechansm used to retan kernel values wthn a farly small range, value of kernel for dentcal trees becomes far hgher than ts value for dfferent trees. Term sze s defned to be number of rules or nternal nodes of th sub-tree type. Samples of such sub-trees are shown n Fg. for a smple parse tree. Snce number of sub-trees of a tree s exponental n ts sze Collns and Duffy, 00), drect nner product calculaton s computatonally nfeasble. Consequently, Collns and Duffy 00) proposed an ngenous kernel functon that mplctly calculates nner product n O N N) tme on trees of sze N and N. 4 A Generalzed Convoluton Tree Kernel In order to descrbe kernel, a feature vector over syntactc parse tree s frstly defned n eq. 3), n whch th feature equals weghted sum of number of nstances of sub-tree type th n tree. Functon I s an ndcator functon that subtree returns f subtree occurs wth ts root at node n and 0 orwse. As descrbed n eq. 4), 67

functon twt) whch stands for "tree weght") assgns a weght to a tree T whch s equal to product of weghts of all ts nodes. H T ) [ I tw subtree )],..., subtree n T tw T ) n T n T [ I [ I subtree subtree m nw n InternalNodes T ) tw subtree )],..., 3) tw subtree m enw n ExternalNodes T ) )]) Fgure. Samples of sub-trees used n convoluton tree kernel calculaton. 4) Snce each node of whole syntactc tree can er happen as an nternal node or as an external node of a supposed sub-tree presumng ts exstence n sub-tree), two types of weghts are respectvely assocated to each node by functons nw and enw whch respectvely stand for "nternal node weght" and "external node weght"). For nstance, n Fg., node wth label s an external node for sub-trees ) and 7) whle t s an nternal node of sub-trees 3) and 4). K T, T ) < H T ), H T ) > [ [ N IN Mark at [ [ n T n T n T n T subtree n T subtree n T C I n ) tw subtree n ))] n ) tw subtree n ))]] n ) I n ) tw subtree n )) tw subtree n ))] gc arport I I n, n ) subtree N IN Mark ) ) 3) 4) 5) subtree 5) As shown n eq. 5), A smlar procedure to Collns and Duffy, 00) can be employed to develop a kernel functon for calculaton of dot products on HT) vectors. Accordng to eq. 5) calculaton of kernel fnally leads to sum of arport IN at arport 6) 7) arport N a C n, n gc ) functon over all tree node pars of T and T. Functon C n, n gc ) s weghted sum of common sub-trees rooted at n and n, and can be recursvely computed n a smlar way to functon C n, of Collns and Duffy, 00) as follows. ) f producton rules of nodes n and n are dfferent n n, n ) 0 C gc ) else f n and n are same pre-termnals same part of speeches) n C gc n, nw enw chld ) nw n ) enw chld n )) 3) else f both n and n have same producton rules n Cgc n, n ) nw nw n ) [ enw chld n )) enw chld n )) + C gc chld n ), chld n ))] In frst case, when two nodes represent dfferent producton rules y can't accordngly have any sub-trees n common. In second case, re s exactly one common sub-tree of sze two. It should be noted that all leaf nodes of tree or words of sentence) are consdered dentcal n calculaton of tree kernel. The value of functon n ths case s weght of ths common sub-tree. In thrd case, when nodes generally represent same producton rules weghted sum of common sub-trees are calculated recursvely. The equaton holds because exstence of common sub-trees rooted at n and n mples exstence of common sub-trees rooted at r correspondng chldren, whch can be combned multplcatvely to form r parents' common sub-trees. Due to equvalent procedure of kernel calculaton, ths generalzed verson of tree kernel preserves nce O N N) tme complexty property of orgnal kernel. It s worthy of note that n Moschtt, 006b) a sortng based method s proposed for fast mplementaton of such tree kernels that reduces average runnng tme to O N + N). The generalzed kernel can be converted to CD 0 kernel by defnng nw λ and enw. Lkewse, or defntons can be utlzed to produce or useful sub-kernels. 68

5 Kernels for Relaton Extracton In ths secton, three sub-kernels of generalzed convoluton tree kernel wll be proposed for relaton extracton. Usng embedded weghts of generalzed kernel, se sub-kernels dfferentate among sub-trees based on r expected relevance to semantc relatons. More specfcally, sub-trees are weghted accordng to how r nodes nteract to arguments of relaton. 5. Argument Ancestor Path Kernel AAP) Defnton of weghtng functons s shown n eq. 6) and 7). Parameter 0 < α s a decayng parameter smlar to λ. α nw 0 enw 0 f n s on argument ancestor path or a drect chld of orwse f n s on orwse a node on t argument ancestor path or a drect chld of a node on t 6) 7) Ths weghtng method s equvalent to applyng CD 0 tree kernel by settng λ α ) on a porton of parse tree that exclusvely ncludes arguments ancestor nodes and r drect chldren. 5. Argument Ancestor Path Dstance Kernel AAPD) Mn AAPDst n,arg ), AAPDst n,arg )) MAX _ DIST nw α 8) Mn AAPDst n,arg ), AAPDst n,arg )) MAX _ DIST enw α 9) Defnton of weghtng functons s shown n eq. 8) and 9). Both functons have dentcal defntons for ths kernel. Functon AAPDstn,arg) calculates dstance of node n from argument arg on parse tree as llustrated by Fg.. MAX_DIST s used for normalzaton, and s maxmum of AAPDstn,arg) n whole tree. In ths way, closer a tree node s to one of arguments ancestor path, less t s decayed by ths weghtng method. N Polce AAPDstarport, ) 5.3 Threshold Senstve Argument Ancestor Path Dstance Kernel TSAAPD) Ths kernel s ntutvely smlar to prevous kernel but uses a rough threshold based decayng technque nstead of a smooth one. The defnton of weghtng functons s shown n eq. 0) and ). Both functons are agan dentcal n ths case. AAPDst Threshold nw α AAPDst Threshold 0) AAPDst Threshold enw α AAPDst Threshold ) 6 Experments 6. Experments Settng S VBN arrested N arport The proposed kernels are evaluated on ACE-005 multlngual corpus Walker et al., 006). In order to avod parsng problems, more formal parts of corpus n "news wre" and "broadcast news" sectons are used for evaluaton as n Zhang et al., 006b). IN Mark at VP JJ last week Fgure. The syntactc parse tree of sentence "Polce arrested Mark at arport last week" that conveys a Phys.LocatedMark, arport) relaton. The ancestor path of argument "arport" dashed curve) and dstance of node of "Mark" from t dotted curve) s shown. 69

Relaton Kernel PER-SOC ART GEN-AFF ORG-AFF PART-WHOLE PHYS CD 0 0.6 0.5 0.09 0.43 0.30 0.3 AAP 0.58 0.49 0.0 0.43 0.8 0.36 AAPD 0.70 0.50 0. 0.43 0.9 0.9 TSAAPD-0 0.63 0.48 0. 0.43 0.30 0.33 TSAAPD- 0.73 0.47 0. 0.45 0.8 0.33 Table : The F -Measure value s shown for every kernel on each ACE-005 man relaton type. For every relaton type best result s shown n bold font. We have used LIBSVM Chang and Ln 00) java source for SVM classfcaton and Stanford NLP package for tokenzaton, sentence segmentaton and parsng. Followng [Bunescu and Mooney, 007], every par of enttes wthn a sentence s regarded as a negatve relaton nstance unless t s annotated as a postve relaton n corpus. The total number of negatve tranng nstances, constructed n ths way, s about 0 tmes more than number of annotated postve nstances. Thus, we also mposed restrcton of maxmum argument dstance of 0 words. Ths constrant elmnates half of negatve constructed nstances whle slghtly decreases postve nstances. Neverless, snce resulted tranng set s stll unbalanced, we used LIBSVM weghtng mechansm. Precsely, f re are P postve and N negatve nstances n tranng set, a weght value of N / P s used for postve nstances whle default weght value of s used for negatve ones. A bnary SVM s traned for every relaton type separately, and type compatble annotated and constructed relaton nstances are used to tran t. For each relaton type, only type compatble relaton nstances are exploted for tranng. For example to learn an ORG-AFF relaton whch apples to PER, ORG) or ORG, ORG) argument types) t s meanngless to use a relaton nstance between two enttes of type PERSON. Moreover, total number of tranng nstances used for tranng every relaton type s restrcted to 5000 nstances to shorten duraton of evaluaton process. The reported results are acheved usng a 5-fold cross valdaton method. The kernels AAP, AAPD and TSAAPD-0 TSAAPD wth threshold 0) and TSAAPD- TSAAPD wth threshold ) are compared wth CD 0 convoluton tree kernel. All kernels http://nlp.stanford.edu/software/ndex.shtml except for AAP are computed on PT porton descrbed n secton. AAP s computed over MCT tree porton whch s also proposed by Zhang et al., 006a) and s sub-tree rooted at frst common ancestor of relaton arguments. For proposed kernels α s set to 0.44 whch s tuned on a development set that contaned 5000 nstances of type PHYS. The λ parameter of CD 0 kernel s set to 0.4 accordng to Zhang et al., 006a). The C parameter of SVM classfcaton s set to.4 for all kernels after tunng t ndvdually for each kernel on mentoned development set. 6. Experments Results The results of experments are shown n Table. The proposed kernels outperform orgnal CD 0 kernel n four of sx relaton types. The performance of TSAAPD- s especally remarkable because t s best kernel n ORG- AFF and PER-SOC relatons. It partcularly performs very well n extracton of PER-SOC relaton wth an F -measure of 0.73. It should be noted that general low performance of all kernels on GEN-AFF type s because of ts extremely small number of annotated nstances n tranng set 40 n 5000). The AAPD kernel has best performance wth a remarkable mprovement over Collns kernel n GEN-AFF relaton type. The results clearly demonstrate that nodes closer to ancestor path of relaton arguments contan most useful syntactc features for relaton extracton 7 Concluson In ths paper, we proposed a generalzed convoluton tree kernel that can generate varous syntactc sub-kernels ncludng CD 0 kernel. 70

The kernel s generalzed by assgnng weghts to sub-trees. The weght of a sub-tree s product of weghts assgned to ts nodes by two types of weghtng functons. In ths way, mpacts of tree nodes on kernel value can be dscrmnated purposely based on applcaton. Context nformaton can also be njected to kernel va context senstve weghtng mechansms. Usng generalzed kernel, varous subkernels can be produced by dfferent defntons of two weghtng functons. We consequently used generalzed kernel for systematc generaton of useful kernels n relaton extracton. In se kernels, closer a node s to relaton arguments ancestor paths, less t s decayed by weghtng functons. Evaluaton on ACE- 005 man relaton types demonstrates effectveness of proposed kernels. They show remarkable performance mprovement over CD 0 kernel. 8 Future Work Although path-enclosed tree porton PT) Zhang et al., 006a) seems to be an approprate porton of syntactc tree for relaton extracton, t only takes nto account syntactc nformaton between relaton arguments, and dscards many useful features before and after arguments features). It seems that generalzed kernel can be used wth larger tree portons that contan syntactc features before and after arguments, because t can be more easly targeted to related features. Currently, proposed weghtng mechansms are solely based on locaton of tree nodes n parse tree; however or useful nformaton such as labels of nodes can also be used n weghtng. Anor future work can be utlzng generalzed kernel for or applcable NLP tasks such as co-reference resoluton. Acknowledgement Ths work s supported by Iran Telecommuncaton Research Centre under contract No. 500-775. References Proceedngs of Ffth Annual Workshop on Computatonal Learnng Theory, pages 44-5. ACM Press. Bunescu R. C. and Mooney R. J. 005a. A Shortest Path Dependency Kernel for Relaton Extracton. EMNLP-005 Bunescu R. C. and Mooney R. J. 005b. Subsequence kernels for relaton extracton. NIPS-005. Bunescu R. C. and Mooney R. J. 007. Learnng for Informaton Extracton: From Named Entty Recognton and Dsambguaton to Relaton Extracton, Ph.D. Thess. Department of Computer Scences, Unversty of Texas at Austn. Chang, C.-C. and C.-J. Ln 00. LIBSVM: a lbrary for support vector machnes. Software avalable at http://www.cse.ntu.edu.tw/~cjln/lbsvm. Collns M. and Duffy N. 00. Convoluton Kernels for Natural Language. NIPS-00 Cortes C. and Vapnk V. 995. Support-vector network. Machne Learnng. 0, 73-97. L J., Zhang Z., L X. and Chen H. 008. Kernel-based learnng for bomedcal relaton extracton. J. Am. Soc. Inf. Sc. Technol. 59, 5, 756 769. Moschtt A. 006a. Makng tree kernels practcal for natural language learnng. EACL-006. Moschtt A. 006b. Syntactc kernels for natural language learnng: semantc role labelng case. HLT-NAACL-006 short paper) Walker, C., Strassel, S., Medero J. and Maeda, K. 006. ACE 005 Multlngual Tranng Corpus. Lngustc Data Consortum, Phladelpha. Zhang M., Zhang J. and SU j. 006a. Explorng syntactc features for relaton extracton usng a convoluton tree kernel. HLT-NAACL-006. Zhang M., Zhang J., Su J. and Zhou G.D. 006b. A Composte Kernel to Extract Relatons between Enttes wth both Flat and Structured COLINGACL-006: 85-83. Zhou G.D., Su J, Zhang J. and Zhang M. 005. Explorng Varous Knowledge n Relaton Extracton. ACL-005 Zhou G.D., Zhang M., J D.H. and Zhu Q.M. 007. Tree Kernel-based Relaton Extracton wth Context- Senstve Structured Parse Tree Informaton. EMNLP-CoNLL-007 Boser B. E., Guyon I., and Vapnk V. 99. A tranng algorthm for optmal margn classfers. In 7