STD and SCR Techniques and Their Evaluations on the NTCIR-10 SpokenDoc-2 task

Size: px
Start display at page:

Download "STD and SCR Techniques and Their Evaluations on the NTCIR-10 SpokenDoc-2 task"

Transcription

1 STD nd SCR Techniques nd Their Evlutions on the NTCIR-10 SpokenDoc-2 tsk Yuto Furuy University of Ymnshi Tked, Kofu Ymnshi, , Jpn furuylps-lb.org Hiromitsu Nishizki University of Ymnshi Tked, Kofu Ymnshi, , Jpn hnishiymnshi.c.jp Diki Nkgomi University of Ymnshi Tked, Kofu Ymnshi, , Jpn dikilps-lb.org Yoshihiro Sekiguchi University of Ymnshi Tked, Kofu Ymnshi, , Jpn sekigutiymnshi.c.jp Stoshi Ntori University of Ymnshi Tked, Kofu Ymnshi, , Jpn ntorilps-lb.org ABSTRACT This pper describes spoken term detection (STD) nd spoken contents retrievl (SCR) techniques nd their evlutions t the NTCIR-10 SpokenDoc-2 tsk. First of ll, we describes our STD technique using phoneme trnsition network (PTN) derived from multiple speech recognizers outputs nd its evlutions t the STD nd the istd (inexistent STD) tsks. Next, we introduce our SCR technique using Web documents for expnding the trget spoken documents. It is evluted on the two SCR tsks. Ctegories nd Subject Descriptors H.3.3 [Informtion Storge nd Retrievl]: Informtion Serch nd Retrievl Generl Terms Algorithms, Experimenttion, Performnce Keywords Multiple recognizers, phoneme trnsition network, spoken conetent retrievl, spoken term detection, spoken document segmenttion Term nme: [ALPS] Subtsk: [Spoken Term Detection], [Spoken Contents Retrievl] Lnguge: [Jpnese] 1. INTRODUCTION Recently, informtion technology environments hve evolved in which numerous udio nd multimedi rchives, such s video rchives, nd digitl librries, cn be esily used. In prticulr, rpidly incresing number of spoken documents, such s brodcst progrms, spoken lectures, nd recordings of meetings, re rchived, with some of them ccessible through the Internet. Although the need to retrieve such spoken informtion is growing, n effective retrievl technique is currently not vilble; thus, the development of technology for retrieving such informtion hs become incresingly importnt. In the Text REtrievl Conference (TREC) Spoken Document Retrievl (SDR) trck hosted by the Ntionl Institute of Stndrds nd Technology (NIST) nd the Defense Advnced Reserch Projects Agency (DARPA) in the second hlf of the 1990s, mny studies of SDR were presented using English nd Mndrin brodcst news documents [5]. The TREC SDR is n d-hoc retrievl tsk tht retrieves spoken documents which re highly relevnt to user query. In 2006, NIST initited the Spoken Term Detection (STD) project with pilot evlution nd workshop [12]. The im of STD is to find the position of spoken terms selected for evlution from n udio rchive. The difficulty in STD lies in the serch for terms in vocbulry-free frmework, becuse serch terms re not known prior to the speech recognizer being used. Mny studies deling STD tsks hve been proposed, such s [16, 9]. Most STD studies focus on the out-of-vocbulry (OOV) nd speech recognition error problems. For exmple, STD techniques tht use entities such s sub-word lttice nd confusion network (CN) hve been proposed. This pper describes our STD nd SCR (spoken content retrievl) evlution on the NTCIR-10 SpokenDoc-2 tsk. The former prt of the pper introduces our STD nd istd techniques nd these evlutions. The other prt of the pper shows the SCR technique nd its evlution results. Our STD technique uses phoneme trsition network (PTN)-formed index derived from multiple speech recognizers 1-best hypothesis nd dynmic time wrping (DTW) frmework with flse control prmeters pplied t the term serching. PTN-bsed indexing origintes from the ide of the CN being generted from speech recognizer. CN-bsed indexing for STD is powerful indexing method. The multiple speech recognizers cngenerte the PTN-formed index by combining sub-word (phoneme) sequences from the output of these recognizers into single CN. This study uses 10 types of speech recognition systems with the sme decoder used for ll. Two types of coustic models (triphonebsed nd syllble-bsed Hidden Mrkov Models (HMMs)) nd five types of lnguge models (word-bsed or sub-word bsed) were prepred. The use of the 10 recognizers nd their output is very ef- 626

2 fective in improving speech recognition performnce. For exmple, Fiscus [4] proposed the ROVER method tht dopts word voting scheme. Utsuro et l. [15] developed technique for combining multiple recognizers output by using support vector mchine (SVM) to improve speech recognition performnce. Appliction of the chrcteristics of the word (or sub-word) sequence output by recognizers my increse STD performnce since these chrcteristics re different for ech speech recognizer. The PTN-bsed on multiple speech recognizers output cn cover more sub-word sequences of spoken terms. Thus, multiple speech recognizers my improve STD performnce compred to tht of single recognizer s output. We evluted the PTN-formed index derived from the 10 recognizers output. The experimentl result for the Jpnese STD test collection [6] showed tht using the PTN formed index ws effective in improving STD performnce compred to tht of the CN-formed index, which ws derived from the phoneme-bsed CN mde up to the 10-best phoneme sequence outputs from single speech recognizer [10, 11]. However, mny flse detection errors occurred becuse the PTN-formed index hd redundnt phonemes tht were mistkenly recognized by some speech recognizers. Using more speech recognizers cn chieve higher speech recognition performnce but more errors my occur. Therefore, we dopted mjority voting prmeter nd mesure of mbiguity, which re esily derived from the PTN, into the term serch engine. We instlled the voting nd mbiguity prmeters of the PTN to our term serch engine to prevent flse detections. Furthermore, we refined the serch engine, where the DTW cost used for clculting the distnce is dynmiclly chnged depending on length of query term (the number of phonemes consisting of query term). We lso chllnged the SCR on the lecture retrievl nd the phrse retrievl tsk. Our SCR technique uses spoken document expnsion using WEB documents. There re two dvntges for using document expnsion using WEB on n SCR tsk. One of dvntges is robust for speech recognition errors nd OOV. The other is to supplement keywords tht re not uttered in the spoken document, which user wishes to look for. For solving the speech recognition errors nd OOV problems, we prepred two kinds of indexes: One is mde from trnscriptions of spoken documents by n LVCSR system; the other is mde from WEB pges relted to the trget spoken documents. In our pproch, WEB pges re retrieved by serch engine tht uses WEB serch queries utomticlly composed from trnscriptions of the trget spoken documents. When document expnsion techniques re used in the retrievl of these documents, WEB pges my be the most suitble, becuse the Internet hs wide vriety of topics. Consequently, in this pper, we investigte the effectiveness of document expnsion by using WEB pges on SCR. The problem of document expnsion using WEB for SCR is to how to collect WEB pges whoes contents re similr to the spoken documents. We performed WEB pges collection by humn power, then, the SDR performnce ws drsticlly improved compred with the performnce without the document expnsion. However, it is hrd to collect suitble WEB pges. So, we propose n WEB collection method using n utomtic spoken document segmenttion Speech dt Recognition system #1 Recognition system #2 Recognition system #N Serch terms Indexing phse 1-best 1-best 1-best Converting to PTN Converting to phoneme sequence Serch phse PTN-bsed index Term Serch Engine Result Figure 1: Overview of the STD frmework. method [14]. Most of the spoken documents hve some topics. Collecting WEB pge for ech topic my improvement the qurity of WEB pges. This pper introduces the our STD, istd nd SCR systems nd reports these systems performnces on the NTCIR- 10 SpokenDoc-2 tsks. 2. STD TASK 2.1 Outline Figure 1 represents n outline of the STD frmework in this pper. In the indexing phse, speech dt is performed by speech recognition nd the recognition outputs (word or sub-word sequences) re converted into the PTN index for STD. In the serch phse, the word-formed query is converted into the phoneme sequence, then the phoneme-formed query is input to the term serch engine. In the cse of treting English queries, we hve to consider the vriety of pronuncitions of the queries. There re some reports fighting the pronuncition problem[17]. In this pper, however, we hndle Jpnese STD. Most of Jpnese words cn be completely trnslted to phoneme sequence (pronuncition). Therefore, we do not consider the pronuncition problem in this pper. The term serch engine serches the input query term from the index in phoneme level using the DTW frmework. 2.2 PTN-bsed indexing Figure 2 shows n exmple of mking prt of PTN-formed index of n utternce cosine (Jpnese pronuncition is /k o s i N/) by performing the lignment process of 10 of phoneme sequences from 1-best hypotheses of the ASRs. We used 10 types of ASRs for mking PTN-bsed index. In Fig. 2, the utternce is recognized by the 10 ASRs, nd then, the 10 hypotheses re obtined. They re converted into the phoneme sequences. Next, we cn get ligned sequences by performing dynmic progrmming (DP) scheme s well s mking the CN-formed index. Finlly, the PTN is gotten by converting the ligned sequences. in Fig. 2 indictes 627

3 Input voice dt : Cosine (/k o s in/ ) LM/AM Outputs of 10 recognition systems (ll outputs re converted into phoneme sequence) WBC/Tri. k o s i WBH/Tri. q o s u N CB/Tri. k o s m i BM/Tri. k o s N Non/Tri. k o s N WBC/Syl. s N WBH/Syl. b o s CB/Syl. s b i BM/Syl. s N Non/Syl. s N rch term Ser i N k o s DTW lttice NULL trnsitions noinsertion errors NULL trnsition noinsertion error Totl cost (distnce): 0.3 k q b Terminl Node o s u Arc Figure 2: Mking PTN-bsed index by performing lignment using on DP nd converting to PTN. null trnsition. Arcs between nodes in the PTN hve some phonemes nd null trnsitions with n occurrence probbility. However, in this pper, we do not use ny phoneme occurrence probbilities. 2.3 Term serch engine with flse detection control The term serch engine uses the DTW-bsed word-spotting method. Figure 3 represents n exmple of the DTW frmework between the serch term k o s i N (cosine) nd for the PTN-formed index. The PTN hs multiple rcs between djoining two nodes. These rcs re compred to one of phoneme lbels of query term. We use edit distnce s cost on the DTW pths, nd the cost vlue for substitution, insertion nd deletion errors is commonly set to 1.0 when the number of phonemes including query is N or lrger thn N. On the other hnd, ech cost is commonly set to 1.5 when the number of phonemes is less thn N to void flse term detections in query terms, hving less number of phonemes. This cost (=1.5) is optimized using development query set. The totl cost D(i, j) t the grid point (i, j) (i = {0,..., I}, j = {0,..., J}, where I nd J re the number of the set of rcs in n index nd query term respectively) on the DTW lttice is clculted by following equtions: D(i, j) = min m b Node D(i, j 1) + Del D(i 1, j) + Null(i) D(i 1, j 1)+ Mtch(i, j) + V ot(i, j) + Acw(i) 0.0 : Query(j) P T N(i) Mtch(i, j) = 1.0 : Query(j) P T N(i), J N 1.5 : Query(j) P T N(i), J < N Del = { 1.0 : J N 1.5 : J < N i N (1) (2) (3) k q b o s u CN(PTN)-bsed index Figure 3: Exmple of term serch on network bsed index. Null(i) = α V oting() β V oting() m b : NULL P T N(i), J N : NULL P T N(i), J < N 1.0 : NULL P T N(i), J N 1.5 : NULL P T N(i), J < N,where P T N(i) is the set of phoneme lbels of the rcs t the i-th node in the PTN, nd Query(j) indictes the j-th phoneme lbel in the query term. We llow null trnsition between two nodes in the PTN-formed index with the cost defined in Eq.(4). When the query term mtches to null () in the PTN, trnsition cost is dynmiclly set s shown in Eq.(4). V oting() mens the number of ASRs outputting NULL t the sme rc. We cll it null voting. α nd β re hyper prmeter, which is optimized using the development set. The pproprite null cost chieves incresing term detection nd decresing flse detections. V ot(i, j) nd Acw(i) in Eq.(1) re relted to the flse detection control prmeters nd clculted s follows: V ot(i, j) = γ V oting(p) : p P T N(i), p = Query(j) 0.0 : Query(j) P T N(i) i N (4) (5) Acw(i) = δ ArcW idth(i) (6) We provide two types of prmeters to control flse detection s follows: V oting(p) is the number of ASRs outputting the sme phoneme p t the sme rc. More vlue of V oting(p) mkes relibility of phoneme p better. ArcW idth(i) is the number of rcs (phoneme lbels) t P T N(i). Less vlue of ArcW idth(i) mkes lso relibility of phonemes t P T N(i) better. γ nd δ re lso hyper prmeters nd set to 0.5 nd 0.01 respectively, optimized by the development query set. In dvnce serching the query term, the term serch engine initilizes D(i, 0) = 0, nd then, clcultes D(i, j) using Eq.1 (i = {0,..., I}, j = {1,..., J}). Furthermore, D(i, J) re normlized by length of the DTW pth. 628

4 After finishing the clcultion, the engine outputs the detection cndidtes which hve normlized cost D(i, J) below threshold θ. Chnging the θ vlue enbles us to control the recll nd precision rtes on STD performnce. 2.4 STD experiment Speech Recognition As shown in Figure 1, the speech dt is recognized by the 10 ASRs. Julius ver [7], n open source decoder for LVCSR, is used in ll the systems. We prepred two types of coustic models (AMs) nd five types of lnguge models (LMs) for constructing the PTN. The AMs re triphone bsed (Tri.) nd syllble bsed HMMs (Syl.), where both types of HMMs were trined from the spoken lectures in the Corpus of Spontneous Jpnese (CSJ) [8]. All the LMs re word nd chrcter bsed trigrms s follows: WBC : word bsed trigrm in which words re represented by mix of Chinese chrcters, Jpnese Hirgn nd Ktkn. WBH : word bsed trigrm in which ll words re represented only by Jpnese Hirgn. The words composed of Chinese chrcters nd Ktkn re converted into Hirgn sequences. CB : chrcter bsed trigrm in which ll chrcters re represented by Hirgn. BM : chrcter sequence bsed trigrm in which the unit of lnguge modeling is two of Hirgn chrcters. Non : No LM is used. Speech recognition without ny LM is equivlent to phoneme (or syllble) recognition. Ech model is trined from the mny trnscriptions in the CSJ under the open for the speech dt of STD. Finlly, the ten combintions, comprising two AMs nd five LMs, re formed. The condition is completely the sme s the description in the overview pper [1] Query set of the STD sub-tsks The NTCIR-10 SpokenDoc orgnizers provided two types of query sets; one, clled s moderte-size, is for the cdemic lectures of the spoken document processing workshop (SDPWS) nd the other, clled s lrge-size is for the ll 2702 lectures in CSJ[1]. The moderte-size query set consists of 200 terms, which is the sme query set for the istd tsk. Hlf terms mong the query set re not included in the SDPWS lecture speeches. 2.5 Experimentl result of the STD tsks Figure 5 nd Figure 4 shows the recll-precision curves tht show the STD performnce for the forml-run modertesize nd lrge-size query sets, respectively. BL-1, BL- 2 nd BL-3 show the bseline results provided by the SpokenDoc-2 orgnizers. The bseline systems hve dynmic progrmming (DP)-bsed word spotting. The score between query term nd n IPU is clculted bsed on phoneme-bsed edit distnce[1]. Tble 1 lso indictes the mximum F-mesure nd MAP vlues on the sme test sets. The decision point for culculting spec. F ws decided by the result of the NTCIR- 9 SpokenDoc STD forml-run query set. We djusted the Precision [%] ALPS Recll [%] Figure 4: Recll-precision curves for the lrge-size query set. threshold to be the best F-mesure vlue on the forml-run set which ws used s development set. Our STD system got the better performnces rther thn the other prticipnts runs[1]. Using multiple LVCSRs outputs ws very powerful for improving STD performnce. And, the flse detection control prmeters worked well to reduce detection error cndidtes. 2.6 Time consumption of the STD processing Our technique hs high performnce on precision-recll curve nd MAP vlue but needs too much time to serch term from the lrge scle speech dt. It tkes 13.5 seconds to serch term from the CSJ CORE speeches (39 hours dt) on computer which hs Intel Core i GHz CPU. 3. ISTD TASK 3.1 Tsk description nd test collection In the istd tsk, we inspect whether queried term is existent or inexistent in the SDPWS lecture speeches. The query set is the sme s the STD moderte-size tsk, which hs 200 queried terms. The hlf of the terms re inexistent in the SDPWS speeches. 3.2 istd Engine Our istd engine is lmost the sme s the engine of the STD tsk. We dditionlly uses entropy vlue of detected cndidte for rnking the query terms. An entropy vlue of n rbitrry intervl in PTN is clculted using the number of phonemes nd posterior probbility of phoneme. A posterior probbility t ny position of PTN is clculted bsed on the number of ASRs tht output the phoneme. Detection entropy (DE t ) of detected cndidte t for BL-1 BL-2 BL-3 629

5 Tble 1: STD performnces for the ech query set. set run micro ve. mcro ve. index serch mx. F [%] spec. F [%] mx. F [%] spec. F [%] MAP size [MB] speed [s] BL lrge- BL size BL ALPS BL moderte- BL size BL ALPS Precision [%] ALPS-1 BL-1 BL-2 BL-3 Recll-Precision curve, Mximum F-mesure (= the blnced point on Recll- Precision curve), Recll, precision nd F-mesure t the specific rnk N (the end point of the curve), Recll nd precision limiting the terms which hve detection= no. Recll nd Precision rtes for terms positioned rnk r nd more thn r re clculted s following functions: Recll r = T /,r N / 100(%) Recll [%] Figure 5: Recll-precision curves for the modertesize query set. queried term is clculted s following equtions: J i V E i = DE t = 1 T j=1 t e 1 V oting(p ij ) R log 2 V oting(p ij ) R (7) i=t s V E i (8) V E i is voting entropy between i-th nd (i + 1)-th nodes in PTN. p ij represents the j-th phoneme t the rcs between i-th nd (i + 1)-th nodes in the PTN, nd J i is the number of phonemes (rcs) between the i-th nd (i + i)-th nodes. V oting(p ij) shows the number of ASRs tht output the phoneme p ij. R is the number of ll ASRs for mking PTN. In this pper, R is 10. DE t is clculted by Eqn. (8). t s nd t e re the first node nd the lst node of cndidte t in PTN, respectively. T is the number of nodes between t s nd t e. The highest normrized DTW cost of the cndidte of query term nd its entropy vlue re linely-combined. Finlly, the 200 terms re rnked (ordered) ccording to combined (DTW cost + entropy) score. 3.3 Evlution metric Evlution metric we used in this tsk re s follows: P recision r = T /,r r 100(%), where T /,r mens the number of / terms positioned rnk r nd more thn r, N / is the totl number of terms belong to clss /. By chnging r from 1 to N, recll-precision curve cn be drwn. A mximum F-mesure tht is from the best blnced point in the curve will lso be used for evlution. 3.4 Result Figure 6 nd Tble 2 show the our istd performnce. ALPS-1 used entropy for rnking the query terms, nd ALPS-2 only used DTW cost. Our STD engine outperformed the bseline systems. Our system ws robust for flse detection errors, lthough more trnscriptions were used. This is becuse tht the flse detection control prmeters nd the entropy of cndidte worked well nd they re effective to reduce flse detections. 4. SCR TASK 4.1 Spoken Document Retrievl Using Web Document Expnsion Figure 7 shows the outline of our proposed technique. First, spoken documents re utomticlly trnscribed by n LVCSR system. The document index is built by removing stop words from the trnscriptions. In this pper, we cll it Recognition-Index (RI). Next, the other index is mde from WEB pges s follows: 1. Ech the trnscription is utomticlly divided into some segments depending on topic. 2. Queries for WEB serches re composed from the trnscriptions of segmented spoken documents. For ech 630

6 Tble 2: istd performnces. (*1) Recll, precision nd F-mesure rtes clculted by top-100-rnked outputs. (*2) Recll, precision nd F-mesure rtes clculted by using outputs with detection=no tg which is specified by ech prticipnt. (*3) Recll, precision nd F-mesure rtes clculted by top-n-rnked outputs. N is set to obtin the muximum F-mesure. run Rnk Specified 2 Mximum 3 R [%] P [%] F [%] R [%] P [%] F [%] rnk R [%] P [%] F [%] rnk BL BL BL ALPS ALPS Spoken documents dtbse Internet 90 LVCSR Trnscriptions Automtic Document Segmenttion Composing queries for Web serch Web serch engine Recll [%] ALPS-1 ALPS-2 20 BL-1 BL-2 10 BL-3 Recognition-Index Retrievl engine for Recognition-Index Two kinds of indexes Web-Index Retrievl engine for Web-Index Post-process of Web documents I d like to serch lectures on Query for SDR Precision [%] Integrting outputs from the two retrievl engines Finl Results Figure 6: Recll-precision curves t the istd tsk. spoken document, multiple queries, which depend on the number of segments of spoken document, re prepred. 3. For ech segment, n WEB serch engine collects WEB pges from the Internet using the query. Most of the collected WEB pges my be relted to the prt of spoken document from which the serch query is mde. 4. Stop words re removed from the collected WEB pges. The WEB-bsed index, which we cll the WEB-Index (WI), is mde from them. The performnce of retrievl using the WI depends on how the WEB serch query is composed. In ddition, the qulity of the query relies on speech recognition performnce during trnscription of the spoken document. However, the object of this study is to investigte whether the WI is effective in spoken document retrievl. Therefore, we dopt very simple query composition method, s described in Section 4.3. In the spoken document retrievl process, two retrievl engines serch the spoken documents, ech using one of the indexes. A query is input to ech retrievl engine; the finl retrievl results re obtined by integrting their two outputs bsed on retrievl score ttched to ech retrievl document. 4.2 Automtic Document Segmenttion Figure 7: A frmework of spoken document retrievl by using document expnsion using WEB. To use WEB pges in spoken document retrievl, queries for n WEB serch engine re composed by extrcting keywords from trnscriptions derived by recognizing spoken documents. However, the trnscriptions include number of words unsuitble for documents on specific topic. Therefore, it is difficult to extrct keywords tht well-represent the topic of spoken document. To solve this problem, first, the spoken documents re utomticlly divided into some segments, then, queries re composed from ech segment. Most of spoken documents hve severl kinds of topics. The segmenttion of spoken documents mkes WEB serch queries to be better for collecting WEB pges. We use the text segmenttion lgorithm proposed by Utiym et l.[14] for spoken document segmenttion. Although the lgorithm ws dopted to word sequences in [14], we extended the lgorithm to ccept sentence sequences. First, we put nodes on the beginning of the document, the end of the document, nd between ll two successive sentences. The cost vlue c ij for segment S ij between two rbitrry nodes i nd j is defined s follows: c ij = length l=1 log length + k tf l + penlty log W (9) 631

7 : c 02 c01:1.744 c12: c 23 : ,, 1,,,, 2 b,b,b,b 3 node : c 02 : c 03 c : c 23 0,, 1,,,, 2 b,b,b,b 3 Figure 8: Exmple of utomtic document segmenttion. where length is the totl number of words contined in segment S ij nd k is the number of kinds of words in the whole document. tf l mens term frequency of word w l occured in S ij nd W is the totl number of words in the whole document. penlty is hyper-prmeter tht cn control the number of segments. The dynmic progrmming (DP) scheme using the cost cn indicte some segmenttion nodes by minimizing the totl DP costs. Figure 8 shows n exmple of document segmenttion. There re three sentences: S 01 =, S 12 =, nd S 23 = bbbb nd the penlty prmeter is set to 1.0 in Figure 8. In this cse, k nd W re 2 nd 12, respectively. For exmple, the cost of the segment S 03 cn be clculted by using Eqution(9) nd its vlue is c 03 = Finlly, the document is divided into two segments: S 02 nd S 23 becuse the totl cost is c 02 + c 23 = , which is smller thn c Query Composition The query composition method we dopted ws very simple. Queries for spoken document re composed of wordbsed N-grms tht occur with very high frequent in the document s trnscription. The procedure contins only five steps, s follows: 1. A trnscription of spoken document is utomticlly divided into M segments by the utomtic segmenttion scheme explined in Section Word-bsed N-grms re extrcted from M segmented trnscription. An N-grm is denoted s sequence of specific prt of speech, nmely, noun nd postpositionl prticle. The length of N is not limited. 3. Some of the N-grms do not exist in the rel world becuse these re extrcted from trnscription contining mny speech recognition errors. These re filtered out by using the corpus of WEB Jpnese N-grm 1st edition, which contins N-grms mde from WEB dt collected by Google Jpn, Inc. In this pper, we cll the corpus Google N-grm. 4. Stop words tht re included in the N-grms re removed. Then, the five most frequent N-grms re extrcted from ech segment. 5. Finlly, M N-grm query sets (ech set hs five N- grms) is used to serch WEB pges. Up to mximum 50 WEB pges re collected for ech spoken document. The number of WEB pges collected by query from segment depends on the number of sentences in the segment. If segment hs 50% of sentences in document, the query from the segment cn get 25 WEB pges. 4.4 Collecting WEB Documents nd Mking WEB-Index For ech query, WEB pges re collected by n WEB serch engine using the query mde from the trnscription of the spoken document. We used Yhoo! WEB serch API s the serch engine in this study. Stop words re removed from the collected WEB pges. WEB pges collected by query set from segment re integrted into file. Ech file corresponds to one specific segment. Finlly, WI is mde of them. 4.5 Spoken Document Retrievl Engine We used the Generic Engine for Trnsposble Assocition (GETA) [3] s the spoken document retrievl engine. The GETA cn relize fst computtion of similrity between query vector nd document vectors. In this reserch, word-unit indexes (RI nd WI) needed to retrieve spoken documents re constructed from only the content words of trnscriptions of spoken documents (RI) or WEB documents (WI) by removing stop words. Nouns, verbs, nd djectives re dopted s content words. Queries for spoken document retrievl re sentences in the form of List some World Heritge sites, for exmple, so morphologicl nlysis is performed on the query to segment it into word sequence. After removing stop words from the sequence, the query is input to the GETA engine. The computtion of similrity between query vector nd document vectors is done by the SMART method [13], which is bsed on cosine similrity, like TF-IDF nd is vilble in the GETA. Ech indexed word is weighted by the TF-IDF method, in which the TF vlue of ech word is normlized on document length (number of words). 4.6 Integrting Outputs from Two Retrievl Engines The finl retrievl results re obtined by integrting the outputs from two retrievl engines. One retrievl engine uses RI, nd the other uses WI. Ech engine outputs list of spoken documents, in order of similrity score. Therefore, we cn get the finl retrievl results ( list of documents) by combining the two similrity scores from RI nd WI. The finl combined similrity score sim(d) for spoken document from the two engines is clculted s follows: sim(d) = (1 α) sim(d r) + α mx sim(ds w) (10) s S where sim(d r) is the score from the RI engine. Suppose tht S = {d 1, d 2,, d M } is segment set when document d is divided into M segments. So, sim(d s w) is the score of segment s for document d from the WI engine. In retrievl experiments, α is empiriclly set to 0.2 for the ll queries. In ddition to this, we lso tried to estimte α utomticlly for ech query using the development set tht is the NTCIR-9 SpokenDoc SDR tsk dt [2]. To do it, we used multiple liner regression nlysis in which n OOV 632

8 Tble 3: Evlution results for the lecture retrievl tsk. run MAP bseline bseline ALPS ALPS Tble 4: Evlution results for the pssge retrievl tsk. run umap pwmap fmap bseline bseline ALPS ALPS rte of query nd retrievl score (SMART) were used s explntory vrible. 4.7 Experimentl result Tble 3 nd Tble 4 show the retrievl results for the lecture retrievl tsk nd the pssge retrievl tsk, respectively. In the ALPS-1 system, α ws utomticlly estimted for ech query. And ALPS-2 used the sme α vlue of 0.2 for the ll query. The web documents expnsion with the spoken document segmenttion ws effective for improving the SCR performnce t the lecture retrievl tsk. However, our proposed method did not worked well t the pssge retrievl tsk. 5. CONCLUSION This pper described the STD, istd nd SCR techniques for the NTCIR-10 SpokenDoc-2. First, we introduced the PTN-bsed indexing, which is essentilly phoneme-bsed CN, derived from multiple speech recognizers outputs. One of the ims of this study ws to use multiple outputs of LVCSRs for constructing the PTNformed index for STD, which is different from the sub-word bsed pproches proposed erlier. Furthermore, we instlled the flse detection control prmeters, mjority voting nd the width of the rc in the PTN, to the DTW frmework. The results indicte tht this ws very effective in controlling flse detection in the query term set. Second, we chllenged the istd tsk using lmost the sme STD engine used on the STD tsk. The flse detection control prmeters lso worked well on the istd tsk. In ddition to this, entropy vlue of detected cndidte ws useful for filtering out flse detection cndidtes. Finlly, we hve shown the SCR technique tht is the document expnsion using WEB pges collected by the document segmention. Our technique worked well t the lecture retrievl tsk. However, it ws not effective on the pssge retrievl tsk. In future work, we intend to develop fst STD lgorithm under the DTW frmework. The processing speed of our engine is too slow for prcticl use. 6. ACKNOWLEDGMENTS This work ws supported by JSPS KAKENHI Grnt-in- Aid for Young Scientists (B) Grnt Number nd Grnt-in-Aid for Scientific Reserch (C) Grnt Number REFERENCES [1] T. Akib, et l. Overview of the ntcir-10 spokendoc-2 tsk. In Proc. of the NTCIR-10 Conference, 10 pges, [2] T. Akib, et l. Overview of the ir for spoken documents tsk in ntcir-9 workshop. In Proc. of the NTCIR-9 Workshop, 8 pges, [3] A. T. et l. Generic engine for trnsposble ssocition (get). In [4] J. G. Fiscus. A Post-processing System to Yield Reduced Word Error Rtes: Recognizer Output Voting Error Reduction (ROVER). In Proc. of ASRU 97, pp , [5] J. S. Grofolo, et l. The trec spoken document retrievl trck: A success story. In Proc. of the Text Retrievl Conference (TREC) 8, pp.16 19, [6] Y. Itoh, et l. Constructing jpnese test collections for spoken term detection. In Proc. of INTERSPEECH2010, pp , [7] A. Lee nd T. Kwhr. Recent development of open-source speech recognition engine julius. In Proc. of APSIPA ASC 2009, 6 pges, [8] K. Mekw. Corpus of Spontneous Jpnese: Its design nd evlution. In Proceedings of the ISCA & IEEE Workshop on Spontneous Speech Processing nd Recognition (SSPR2003), pges 7 12, [9] S. Meng, et l. Addressing the out-of-vocbulry problem for lrge-scle chinese spoken term detection. In Proc. of INTERSPEECH2008, pges , [10] S. Ntori, et l. Jpnese spoken term detection using syllble trnsition network derived from multiple speech recognizers outputs. In Proc. of INTERSPEECH2010, pp , [11] S. Ntori, et l. Network-formed index from multiple speech recognizers outputs on spoken term detection. In Proc. of APSIPA ASC 2010 (student symposium), pge 1, [12] The spoken term detection (STD) 2006 evlution pln, tests/std/2006/docs/std06-evlpln-v10.pdf [13] A. Singhl, et l. Pivoted document length normliztion. In Proc. of ACM SIGIR 96, pp , [14] M. Utiym nd H. Ishr. A Sttisticl Model for Domin-Independent Text Segmenttion. In Proc.of the 9th ECACL, pp , [15] T. Utsuro, et l. An empiricl study on multiple LVCSR model combintion by mchine lerning. In Proc. of HLT-NAACL 2004, pp.13 16, [16] D. Vergyri, et l. The SRI/OGI 2006 spoken term detection system. In Proc. of INTERSPEECH2007, pp , [17] D. Wng, et l. Stochstic pronuncition modelling for out-of-vocbulry spoken term detection. IEEE Trns. on Audio, Speech, nd Lnguge Processing, 19(4): ,

Spoken Term Detection Using Multiple Speech Recognizers Outputs at NTCIR-9 SpokenDoc STD subtask

Spoken Term Detection Using Multiple Speech Recognizers Outputs at NTCIR-9 SpokenDoc STD subtask NTCIR-9 Workshop: SpokenDoc Spoken Term Detection Using Multiple Speech Recognizers Outputs at NTCIR-9 SpokenDoc STD subtask Hiromitsu Nishizaki Yuto Furuya Satoshi Natori Yoshihiro Sekiguchi University

More information

In the last lecture, we discussed how valid tokens may be specified by regular expressions.

In the last lecture, we discussed how valid tokens may be specified by regular expressions. LECTURE 5 Scnning SYNTAX ANALYSIS We know from our previous lectures tht the process of verifying the syntx of the progrm is performed in two stges: Scnning: Identifying nd verifying tokens in progrm.

More information

Midterm 2 Sample solution

Midterm 2 Sample solution Nme: Instructions Midterm 2 Smple solution CMSC 430 Introduction to Compilers Fll 2012 November 28, 2012 This exm contins 9 pges, including this one. Mke sure you hve ll the pges. Write your nme on the

More information

A New Learning Algorithm for the MAXQ Hierarchical Reinforcement Learning Method

A New Learning Algorithm for the MAXQ Hierarchical Reinforcement Learning Method A New Lerning Algorithm for the MAXQ Hierrchicl Reinforcement Lerning Method Frzneh Mirzzdeh 1, Bbk Behsz 2, nd Hmid Beigy 1 1 Deprtment of Computer Engineering, Shrif University of Technology, Tehrn,

More information

Complete Coverage Path Planning of Mobile Robot Based on Dynamic Programming Algorithm Peng Zhou, Zhong-min Wang, Zhen-nan Li, Yang Li

Complete Coverage Path Planning of Mobile Robot Based on Dynamic Programming Algorithm Peng Zhou, Zhong-min Wang, Zhen-nan Li, Yang Li 2nd Interntionl Conference on Electronic & Mechnicl Engineering nd Informtion Technology (EMEIT-212) Complete Coverge Pth Plnning of Mobile Robot Bsed on Dynmic Progrmming Algorithm Peng Zhou, Zhong-min

More information

2014 Haskell January Test Regular Expressions and Finite Automata

2014 Haskell January Test Regular Expressions and Finite Automata 0 Hskell Jnury Test Regulr Expressions nd Finite Automt This test comprises four prts nd the mximum mrk is 5. Prts I, II nd III re worth 3 of the 5 mrks vilble. The 0 Hskell Progrmming Prize will be wrded

More information

Statistical classification of spatial relationships among mathematical symbols

Statistical classification of spatial relationships among mathematical symbols 2009 10th Interntionl Conference on Document Anlysis nd Recognition Sttisticl clssifiction of sptil reltionships mong mthemticl symbols Wl Aly, Seiichi Uchid Deprtment of Intelligent Systems, Kyushu University

More information

The Distributed Data Access Schemes in Lambda Grid Networks

The Distributed Data Access Schemes in Lambda Grid Networks The Distributed Dt Access Schemes in Lmbd Grid Networks Ryot Usui, Hiroyuki Miygi, Yutk Arkw, Storu Okmoto, nd Noki Ymnk Grdute School of Science for Open nd Environmentl Systems, Keio University, Jpn

More information

Selection of Best Match Keyword using Spoken Term Detection for Spoken Document Indexing

Selection of Best Match Keyword using Spoken Term Detection for Spoken Document Indexing Selection of Best Match Keyword using Spoken Term Detection for Spoken Document Indexing Kentaro Domoto, Takehito Utsuro, Naoki Sawada and Hiromitsu Nishizaki Graduate School of Systems and Information

More information

P(r)dr = probability of generating a random number in the interval dr near r. For this probability idea to make sense we must have

P(r)dr = probability of generating a random number in the interval dr near r. For this probability idea to make sense we must have Rndom Numers nd Monte Crlo Methods Rndom Numer Methods The integrtion methods discussed so fr ll re sed upon mking polynomil pproximtions to the integrnd. Another clss of numericl methods relies upon using

More information

A Comparison of the Discretization Approach for CST and Discretization Approach for VDM

A Comparison of the Discretization Approach for CST and Discretization Approach for VDM Interntionl Journl of Innovtive Reserch in Advnced Engineering (IJIRAE) Volume1 Issue1 (Mrch 2014) A Comprison of the Discretiztion Approch for CST nd Discretiztion Approch for VDM Omr A. A. Shib Fculty

More information

Fig.25: the Role of LEX

Fig.25: the Role of LEX The Lnguge for Specifying Lexicl Anlyzer We shll now study how to uild lexicl nlyzer from specifiction of tokens in the form of list of regulr expressions The discussion centers round the design of n existing

More information

vcloud Director Service Provider Admin Portal Guide vcloud Director 9.1

vcloud Director Service Provider Admin Portal Guide vcloud Director 9.1 vcloud Director Service Provider Admin Portl Guide vcloud Director 9. vcloud Director Service Provider Admin Portl Guide You cn find the most up-to-dte technicl documenttion on the VMwre website t: https://docs.vmwre.com/

More information

Slides for Data Mining by I. H. Witten and E. Frank

Slides for Data Mining by I. H. Witten and E. Frank Slides for Dt Mining y I. H. Witten nd E. Frnk Simplicity first Simple lgorithms often work very well! There re mny kinds of simple structure, eg: One ttriute does ll the work All ttriutes contriute eqully

More information

On the Detection of Step Edges in Algorithms Based on Gradient Vector Analysis

On the Detection of Step Edges in Algorithms Based on Gradient Vector Analysis On the Detection of Step Edges in Algorithms Bsed on Grdient Vector Anlysis A. Lrr6, E. Montseny Computer Engineering Dept. Universitt Rovir i Virgili Crreter de Slou sin 43006 Trrgon, Spin Emil: lrre@etse.urv.es

More information

COMP 423 lecture 11 Jan. 28, 2008

COMP 423 lecture 11 Jan. 28, 2008 COMP 423 lecture 11 Jn. 28, 2008 Up to now, we hve looked t how some symols in n lphet occur more frequently thn others nd how we cn sve its y using code such tht the codewords for more frequently occuring

More information

MATH 25 CLASS 5 NOTES, SEP

MATH 25 CLASS 5 NOTES, SEP MATH 25 CLASS 5 NOTES, SEP 30 2011 Contents 1. A brief diversion: reltively prime numbers 1 2. Lest common multiples 3 3. Finding ll solutions to x + by = c 4 Quick links to definitions/theorems Euclid

More information

Engineer To Engineer Note

Engineer To Engineer Note Engineer To Engineer Note EE-186 Technicl Notes on using Anlog Devices' DSP components nd development tools Contct our technicl support by phone: (800) ANALOG-D or e-mil: dsp.support@nlog.com Or visit

More information

If f(x, y) is a surface that lies above r(t), we can think about the area between the surface and the curve.

If f(x, y) is a surface that lies above r(t), we can think about the area between the surface and the curve. Line Integrls The ide of line integrl is very similr to tht of single integrls. If the function f(x) is bove the x-xis on the intervl [, b], then the integrl of f(x) over [, b] is the re under f over the

More information

Alignment of Long Sequences. BMI/CS Spring 2012 Colin Dewey

Alignment of Long Sequences. BMI/CS Spring 2012 Colin Dewey Alignment of Long Sequences BMI/CS 776 www.biostt.wisc.edu/bmi776/ Spring 2012 Colin Dewey cdewey@biostt.wisc.edu Gols for Lecture the key concepts to understnd re the following how lrge-scle lignment

More information

Text mining: bag of words representation and beyond it

Text mining: bag of words representation and beyond it Text mining: bg of words representtion nd beyond it Jsmink Dobš Fculty of Orgniztion nd Informtics University of Zgreb 1 Outline Definition of text mining Vector spce model or Bg of words representtion

More information

a < a+ x < a+2 x < < a+n x = b, n A i n f(x i ) x. i=1 i=1

a < a+ x < a+2 x < < a+n x = b, n A i n f(x i ) x. i=1 i=1 Mth 33 Volume Stewrt 5.2 Geometry of integrls. In this section, we will lern how to compute volumes using integrls defined by slice nlysis. First, we recll from Clculus I how to compute res. Given the

More information

Fig.1. Let a source of monochromatic light be incident on a slit of finite width a, as shown in Fig. 1.

Fig.1. Let a source of monochromatic light be incident on a slit of finite width a, as shown in Fig. 1. Answer on Question #5692, Physics, Optics Stte slient fetures of single slit Frunhofer diffrction pttern. The slit is verticl nd illuminted by point source. Also, obtin n expression for intensity distribution

More information

Lecture 10 Evolutionary Computation: Evolution strategies and genetic programming

Lecture 10 Evolutionary Computation: Evolution strategies and genetic programming Lecture 10 Evolutionry Computtion: Evolution strtegies nd genetic progrmming Evolution strtegies Genetic progrmming Summry Negnevitsky, Person Eduction, 2011 1 Evolution Strtegies Another pproch to simulting

More information

this grammar generates the following language: Because this symbol will also be used in a later step, it receives the

this grammar generates the following language: Because this symbol will also be used in a later step, it receives the LR() nlysis Drwcks of LR(). Look-hed symols s eplined efore, concerning LR(), it is possile to consult the net set to determine, in the reduction sttes, for which symols it would e possile to perform reductions.

More information

CS321 Languages and Compiler Design I. Winter 2012 Lecture 5

CS321 Languages and Compiler Design I. Winter 2012 Lecture 5 CS321 Lnguges nd Compiler Design I Winter 2012 Lecture 5 1 FINITE AUTOMATA A non-deterministic finite utomton (NFA) consists of: An input lphet Σ, e.g. Σ =,. A set of sttes S, e.g. S = {1, 3, 5, 7, 11,

More information

A Transportation Problem Analysed by a New Ranking Method

A Transportation Problem Analysed by a New Ranking Method (IJIRSE) Interntionl Journl of Innovtive Reserch in Science & Engineering ISSN (Online) 7-07 A Trnsporttion Problem Anlysed by New Rnking Method Dr. A. Shy Sudh P. Chinthiy Associte Professor PG Scholr

More information

ECE 468/573 Midterm 1 September 28, 2012

ECE 468/573 Midterm 1 September 28, 2012 ECE 468/573 Midterm 1 September 28, 2012 Nme:! Purdue emil:! Plese sign the following: I ffirm tht the nswers given on this test re mine nd mine lone. I did not receive help from ny person or mteril (other

More information

Algorithm Design (5) Text Search

Algorithm Design (5) Text Search Algorithm Design (5) Text Serch Tkshi Chikym School of Engineering The University of Tokyo Text Serch Find sustring tht mtches the given key string in text dt of lrge mount Key string: chr x[m] Text Dt:

More information

Unit #9 : Definite Integral Properties, Fundamental Theorem of Calculus

Unit #9 : Definite Integral Properties, Fundamental Theorem of Calculus Unit #9 : Definite Integrl Properties, Fundmentl Theorem of Clculus Gols: Identify properties of definite integrls Define odd nd even functions, nd reltionship to integrl vlues Introduce the Fundmentl

More information

MA1008. Calculus and Linear Algebra for Engineers. Course Notes for Section B. Stephen Wills. Department of Mathematics. University College Cork

MA1008. Calculus and Linear Algebra for Engineers. Course Notes for Section B. Stephen Wills. Department of Mathematics. University College Cork MA1008 Clculus nd Liner Algebr for Engineers Course Notes for Section B Stephen Wills Deprtment of Mthemtics University College Cork s.wills@ucc.ie http://euclid.ucc.ie/pges/stff/wills/teching/m1008/ma1008.html

More information

4452 Mathematical Modeling Lecture 4: Lagrange Multipliers

4452 Mathematical Modeling Lecture 4: Lagrange Multipliers Mth Modeling Lecture 4: Lgrnge Multipliers Pge 4452 Mthemticl Modeling Lecture 4: Lgrnge Multipliers Lgrnge multipliers re high powered mthemticl technique to find the mximum nd minimum of multidimensionl

More information

Misrepresentation of Preferences

Misrepresentation of Preferences Misrepresenttion of Preferences Gicomo Bonnno Deprtment of Economics, University of Cliforni, Dvis, USA gfbonnno@ucdvis.edu Socil choice functions Arrow s theorem sys tht it is not possible to extrct from

More information

Efficient K-NN Search in Polyphonic Music Databases Using a Lower Bounding Mechanism

Efficient K-NN Search in Polyphonic Music Databases Using a Lower Bounding Mechanism Efficient K-NN Serch in Polyphonic Music Dtses Using Lower Bounding Mechnism Ning-Hn Liu Deprtment of Computer Science Ntionl Tsing Hu University Hsinchu,Tiwn 300, R.O.C 886-3-575679 nhliou@yhoo.com.tw

More information

Computing offsets of freeform curves using quadratic trigonometric splines

Computing offsets of freeform curves using quadratic trigonometric splines Computing offsets of freeform curves using qudrtic trigonometric splines JIULONG GU, JAE-DEUK YUN, YOONG-HO JUNG*, TAE-GYEONG KIM,JEONG-WOON LEE, BONG-JUN KIM School of Mechnicl Engineering Pusn Ntionl

More information

CHAPTER III IMAGE DEWARPING (CALIBRATION) PROCEDURE

CHAPTER III IMAGE DEWARPING (CALIBRATION) PROCEDURE CHAPTER III IMAGE DEWARPING (CALIBRATION) PROCEDURE 3.1 Scheimpflug Configurtion nd Perspective Distortion Scheimpflug criterion were found out to be the best lyout configurtion for Stereoscopic PIV, becuse

More information

x )Scales are the reciprocal of each other. e

x )Scales are the reciprocal of each other. e 9. Reciprocls A Complete Slide Rule Mnul - eville W Young Chpter 9 Further Applictions of the LL scles The LL (e x ) scles nd the corresponding LL 0 (e -x or Exmple : 0.244 4.. Set the hir line over 4.

More information

Efficient Regular Expression Grouping Algorithm Based on Label Propagation Xi Chena, Shuqiao Chenb and Ming Maoc

Efficient Regular Expression Grouping Algorithm Based on Label Propagation Xi Chena, Shuqiao Chenb and Ming Maoc 4th Ntionl Conference on Electricl, Electronics nd Computer Engineering (NCEECE 2015) Efficient Regulr Expression Grouping Algorithm Bsed on Lbel Propgtion Xi Chen, Shuqio Chenb nd Ming Moc Ntionl Digitl

More information

II. THE ALGORITHM. A. Depth Map Processing

II. THE ALGORITHM. A. Depth Map Processing Lerning Plnr Geometric Scene Context Using Stereo Vision Pul G. Bumstrck, Bryn D. Brudevold, nd Pul D. Reynolds {pbumstrck,brynb,pulr2}@stnford.edu CS229 Finl Project Report December 15, 2006 Abstrct A

More information

Small Business Networking

Small Business Networking Why network is n essentil productivity tool for ny smll business Effective technology is essentil for smll businesses looking to increse the productivity of their people nd business. Introducing technology

More information

Small Business Networking

Small Business Networking Why network is n essentil productivity tool for ny smll business Effective technology is essentil for smll businesses looking to increse the productivity of their people nd business. Introducing technology

More information

pdfapilot Server 2 Manual

pdfapilot Server 2 Manual pdfpilot Server 2 Mnul 2011 by clls softwre gmbh Schönhuser Allee 6/7 D 10119 Berlin Germny info@cllssoftwre.com www.cllssoftwre.com Mnul clls pdfpilot Server 2 Pge 2 clls pdfpilot Server 2 Mnul Lst modified:

More information

Presentation Martin Randers

Presentation Martin Randers Presenttion Mrtin Rnders Outline Introduction Algorithms Implementtion nd experiments Memory consumption Summry Introduction Introduction Evolution of species cn e modelled in trees Trees consist of nodes

More information

9 Graph Cutting Procedures

9 Graph Cutting Procedures 9 Grph Cutting Procedures Lst clss we begn looking t how to embed rbitrry metrics into distributions of trees, nd proved the following theorem due to Brtl (1996): Theorem 9.1 (Brtl (1996)) Given metric

More information

CSCI1950 Z Computa4onal Methods for Biology Lecture 2. Ben Raphael January 26, hhp://cs.brown.edu/courses/csci1950 z/ Outline

CSCI1950 Z Computa4onal Methods for Biology Lecture 2. Ben Raphael January 26, hhp://cs.brown.edu/courses/csci1950 z/ Outline CSCI1950 Z Comput4onl Methods for Biology Lecture 2 Ben Rphel Jnury 26, 2009 hhp://cs.brown.edu/courses/csci1950 z/ Outline Review of trees. Coun4ng fetures. Chrcter bsed phylogeny Mximum prsimony Mximum

More information

CS201 Discussion 10 DRAWTREE + TRIES

CS201 Discussion 10 DRAWTREE + TRIES CS201 Discussion 10 DRAWTREE + TRIES DrwTree First instinct: recursion As very generic structure, we could tckle this problem s follows: drw(): Find the root drw(root) drw(root): Write the line for the

More information

2 Computing all Intersections of a Set of Segments Line Segment Intersection

2 Computing all Intersections of a Set of Segments Line Segment Intersection 15-451/651: Design & Anlysis of Algorithms Novemer 14, 2016 Lecture #21 Sweep-Line nd Segment Intersection lst chnged: Novemer 8, 2017 1 Preliminries The sweep-line prdigm is very powerful lgorithmic design

More information

Small Business Networking

Small Business Networking Why network is n essentil productivity tool for ny smll business Effective technology is essentil for smll businesses looking to increse the productivity of their people nd processes. Introducing technology

More information

CS412/413. Introduction to Compilers Tim Teitelbaum. Lecture 4: Lexical Analyzers 28 Jan 08

CS412/413. Introduction to Compilers Tim Teitelbaum. Lecture 4: Lexical Analyzers 28 Jan 08 CS412/413 Introduction to Compilers Tim Teitelum Lecture 4: Lexicl Anlyzers 28 Jn 08 Outline DFA stte minimiztion Lexicl nlyzers Automting lexicl nlysis Jlex lexicl nlyzer genertor CS 412/413 Spring 2008

More information

Definition of Regular Expression

Definition of Regular Expression Definition of Regulr Expression After the definition of the string nd lnguges, we re redy to descrie regulr expressions, the nottion we shll use to define the clss of lnguges known s regulr sets. Recll

More information

Small Business Networking

Small Business Networking Why network is n essentil productivity tool for ny smll business Effective technology is essentil for smll businesses looking to increse the productivity of their people nd business. Introducing technology

More information

Scalable Peer-to-Peer Web Retrieval with Highly Discriminative Keys

Scalable Peer-to-Peer Web Retrieval with Highly Discriminative Keys Sclble Peer-to-Peer Web Retrievl with Highly Discrimintive Keys Ivn Podnr, Mrtin Rjmn, Ton Luu, Fbius Klemm, Krl Aberer School of Computer nd Communiction Sciences Ecole Polytechnique Fédérle de Lusnne

More information

A REINFORCEMENT LEARNING APPROACH TO SCHEDULING DUAL-ARMED CLUSTER TOOLS WITH TIME VARIATIONS

A REINFORCEMENT LEARNING APPROACH TO SCHEDULING DUAL-ARMED CLUSTER TOOLS WITH TIME VARIATIONS A REINFORCEMENT LEARNING APPROACH TO SCHEDULING DUAL-ARMED CLUSTER TOOLS WITH TIME VARIATIONS Ji-Eun Roh (), Te-Eog Lee (b) (),(b) Deprtment of Industril nd Systems Engineering, Kore Advnced Institute

More information

Small Business Networking

Small Business Networking Why network is n essentil productivity tool for ny smll business Effective technology is essentil for smll businesses looking to increse the productivity of their people nd processes. Introducing technology

More information

Before We Begin. Introduction to Spatial Domain Filtering. Introduction to Digital Image Processing. Overview (1): Administrative Details (1):

Before We Begin. Introduction to Spatial Domain Filtering. Introduction to Digital Image Processing. Overview (1): Administrative Details (1): Overview (): Before We Begin Administrtive detils Review some questions to consider Winter 2006 Imge Enhncement in the Sptil Domin: Bsics of Sptil Filtering, Smoothing Sptil Filters, Order Sttistics Filters

More information

9 4. CISC - Curriculum & Instruction Steering Committee. California County Superintendents Educational Services Association

9 4. CISC - Curriculum & Instruction Steering Committee. California County Superintendents Educational Services Association 9. CISC - Curriculum & Instruction Steering Committee The Winning EQUATION A HIGH QUALITY MATHEMATICS PROFESSIONAL DEVELOPMENT PROGRAM FOR TEACHERS IN GRADES THROUGH ALGEBRA II STRAND: NUMBER SENSE: Rtionl

More information

Stained Glass Design. Teaching Goals:

Stained Glass Design. Teaching Goals: Stined Glss Design Time required 45-90 minutes Teching Gols: 1. Students pply grphic methods to design vrious shpes on the plne.. Students pply geometric trnsformtions of grphs of functions in order to

More information

Chapter 2 Sensitivity Analysis: Differential Calculus of Models

Chapter 2 Sensitivity Analysis: Differential Calculus of Models Chpter 2 Sensitivity Anlysis: Differentil Clculus of Models Abstrct Models in remote sensing nd in science nd engineering, in generl re, essentilly, functions of discrete model input prmeters, nd/or functionls

More information

Dr. D.M. Akbar Hussain

Dr. D.M. Akbar Hussain Dr. D.M. Akr Hussin Lexicl Anlysis. Bsic Ide: Red the source code nd generte tokens, it is similr wht humns will do to red in; just tking on the input nd reking it down in pieces. Ech token is sequence

More information

Assignment 4. Due 09/18/17

Assignment 4. Due 09/18/17 Assignment 4. ue 09/18/17 1. ). Write regulr expressions tht define the strings recognized by the following finite utomt: b d b b b c c b) Write FA tht recognizes the tokens defined by the following regulr

More information

Tool Vendor Perspectives SysML Thus Far

Tool Vendor Perspectives SysML Thus Far Frontiers 2008 Pnel Georgi Tec, 05-13-08 Tool Vendor Perspectives SysML Thus Fr Hns-Peter Hoffmnn, Ph.D Chief Systems Methodologist Telelogic, Systems & Softwre Modeling Business Unit Peter.Hoffmnn@telelogic.com

More information

UNIT 11. Query Optimization

UNIT 11. Query Optimization UNIT Query Optimiztion Contents Introduction to Query Optimiztion 2 The Optimiztion Process: An Overview 3 Optimiztion in System R 4 Optimiztion in INGRES 5 Implementing the Join Opertors Wei-Png Yng,

More information

Parallel Square and Cube Computations

Parallel Square and Cube Computations Prllel Squre nd Cube Computtions Albert A. Liddicot nd Michel J. Flynn Computer Systems Lbortory, Deprtment of Electricl Engineering Stnford University Gtes Building 5 Serr Mll, Stnford, CA 945, USA liddicot@stnford.edu

More information

Small Business Networking

Small Business Networking Why network is n essentil productivity tool for ny smll business Effective technology is essentil for smll businesses looking to increse the productivity of their people nd processes. Introducing technology

More information

Dynamic Programming. Andreas Klappenecker. [partially based on slides by Prof. Welch] Monday, September 24, 2012

Dynamic Programming. Andreas Klappenecker. [partially based on slides by Prof. Welch] Monday, September 24, 2012 Dynmic Progrmming Andres Klppenecker [prtilly bsed on slides by Prof. Welch] 1 Dynmic Progrmming Optiml substructure An optiml solution to the problem contins within it optiml solutions to subproblems.

More information

A Scalable and Reliable Mobile Agent Computation Model

A Scalable and Reliable Mobile Agent Computation Model A Sclble nd Relible Mobile Agent Computtion Model Yong Liu, Congfu Xu, Zhohui Wu, nd Yunhe Pn College of Computer Science, Zhejing University Hngzhou 310027, Chin cckffe@yhoo.com.cn Abstrct. This pper

More information

Engineer To Engineer Note

Engineer To Engineer Note Engineer To Engineer Note EE-169 Technicl Notes on using Anlog Devices' DSP components nd development tools Contct our technicl support by phone: (800) ANALOG-D or e-mil: dsp.support@nlog.com Or visit

More information

Small Business Networking

Small Business Networking Why network is n essentil productivity tool for ny smll business Effective technology is essentil for smll businesses looking to increse the productivity of their people nd business. Introducing technology

More information

CS 432 Fall Mike Lam, Professor a (bc)* Regular Expressions and Finite Automata

CS 432 Fall Mike Lam, Professor a (bc)* Regular Expressions and Finite Automata CS 432 Fll 2017 Mike Lm, Professor (c)* Regulr Expressions nd Finite Automt Compiltion Current focus "Bck end" Source code Tokens Syntx tree Mchine code chr dt[20]; int min() { flot x = 42.0; return 7;

More information

HW Stereotactic Targeting

HW Stereotactic Targeting HW Stereotctic Trgeting We re bout to perform stereotctic rdiosurgery with the Gmm Knife under CT guidnce. We instrument the ptient with bse ring nd for CT scnning we ttch fiducil cge (FC). Above: bse

More information

Topic 2: Lexing and Flexing

Topic 2: Lexing and Flexing Topic 2: Lexing nd Flexing COS 320 Compiling Techniques Princeton University Spring 2016 Lennrt Beringer 1 2 The Compiler Lexicl Anlysis Gol: rek strem of ASCII chrcters (source/input) into sequence of

More information

UNIVERSITY OF EDINBURGH COLLEGE OF SCIENCE AND ENGINEERING SCHOOL OF INFORMATICS INFORMATICS 1 COMPUTATION & LOGIC INSTRUCTIONS TO CANDIDATES

UNIVERSITY OF EDINBURGH COLLEGE OF SCIENCE AND ENGINEERING SCHOOL OF INFORMATICS INFORMATICS 1 COMPUTATION & LOGIC INSTRUCTIONS TO CANDIDATES UNIVERSITY OF EDINBURGH COLLEGE OF SCIENCE AND ENGINEERING SCHOOL OF INFORMATICS INFORMATICS COMPUTATION & LOGIC Sturdy st April 7 : to : INSTRUCTIONS TO CANDIDATES This is tke-home exercise. It will not

More information

CS143 Handout 07 Summer 2011 June 24 th, 2011 Written Set 1: Lexical Analysis

CS143 Handout 07 Summer 2011 June 24 th, 2011 Written Set 1: Lexical Analysis CS143 Hndout 07 Summer 2011 June 24 th, 2011 Written Set 1: Lexicl Anlysis In this first written ssignment, you'll get the chnce to ply round with the vrious constructions tht come up when doing lexicl

More information

An Efficient and Effective Case Classification Method Based On Slicing

An Efficient and Effective Case Classification Method Based On Slicing An Efficient nd Effective Cse Clssifiction Method Bsed On Slicing An Efficient nd Effective Cse Clssifiction Method Bsed On Slicing Omr A. A. Shib, Md. Nsir Sulimn, Ali Mmt nd Ftimh Ahmd Fculty of Computer

More information

PYTHON PROGRAMMING. The History of Python. Features of Python. This Course

PYTHON PROGRAMMING. The History of Python. Features of Python. This Course The History of Python PYTHON PROGRAMMING Dr Christin Hill 7 9 November 2016 Invented by Guido vn Rossum* t the Centrum Wiskunde & Informtic in Amsterdm in the erly 1990s Nmed fter Monty Python s Flying

More information

Preserving Constraints for Aggregation Relationship Type Update in XML Document

Preserving Constraints for Aggregation Relationship Type Update in XML Document Preserving Constrints for Aggregtion Reltionship Type Updte in XML Document Eric Prdede 1, J. Wenny Rhyu 1, nd Dvid Tnir 2 1 Deprtment of Computer Science nd Computer Engineering, L Trobe University, Bundoor

More information

3.5.1 Single slit diffraction

3.5.1 Single slit diffraction 3..1 Single slit diffrction ves pssing through single slit will lso diffrct nd produce n interference pttern. The reson for this is to do with the finite width of the slit. e will consider this lter. Tke

More information

A Fully Automated Object Extraction System. for the World Wide Web. David Buttler, Ling Liu, Calton Pu. Georgia Institute of Technology

A Fully Automated Object Extraction System. for the World Wide Web. David Buttler, Ling Liu, Calton Pu. Georgia Institute of Technology A Fully Automted Object Exction System for the World Wide Web Dvid Buttler, Ling Liu, Clton Pu Georgi Institute of Technology College of Computing Atlnt, GA 30332, U.S.A. fbuttler, lingliu, cltong@cc.gtech.edu

More information

DYNAMIC DISCRETIZATION: A COMBINATION APPROACH

DYNAMIC DISCRETIZATION: A COMBINATION APPROACH DYNAMIC DISCRETIZATION: A COMBINATION APPROACH FAN MIN, QIHE LIU, HONGBIN CAI, AND ZHONGJIAN BAI School of Computer Science nd Engineering, University of Electronic Science nd Technology of Chin, Chengdu

More information

Fall 2018 Midterm 1 October 11, ˆ You may not ask questions about the exam except for language clarifications.

Fall 2018 Midterm 1 October 11, ˆ You may not ask questions about the exam except for language clarifications. 15-112 Fll 2018 Midterm 1 October 11, 2018 Nme: Andrew ID: Recittion Section: ˆ You my not use ny books, notes, extr pper, or electronic devices during this exm. There should be nothing on your desk or

More information

Midterm I Solutions CS164, Spring 2006

Midterm I Solutions CS164, Spring 2006 Midterm I Solutions CS164, Spring 2006 Februry 23, 2006 Plese red ll instructions (including these) crefully. Write your nme, login, SID, nd circle the section time. There re 8 pges in this exm nd 4 questions,

More information

CS 430 Spring Mike Lam, Professor. Parsing

CS 430 Spring Mike Lam, Professor. Parsing CS 430 Spring 2015 Mike Lm, Professor Prsing Syntx Anlysis We cn now formlly descrie lnguge's syntx Using regulr expressions nd BNF grmmrs How does tht help us? Syntx Anlysis We cn now formlly descrie

More information

Tixeo compared to other videoconferencing solutions

Tixeo compared to other videoconferencing solutions compred to other videoconferencing solutions for V171026EN , unique solution on the video conferencing field Adobe Connect Web RTC Vydio for High security level, privcy Zero impct on network security policies

More information

Overview. Network characteristics. Network architecture. Data dissemination. Network characteristics (cont d) Mobile computing and databases

Overview. Network characteristics. Network architecture. Data dissemination. Network characteristics (cont d) Mobile computing and databases Overview Mobile computing nd dtbses Generl issues in mobile dt mngement Dt dissemintion Dt consistency Loction dependent queries Interfces Detils of brodcst disks thlis klfigopoulos Network rchitecture

More information

CS 321 Programming Languages and Compilers. Bottom Up Parsing

CS 321 Programming Languages and Compilers. Bottom Up Parsing CS 321 Progrmming nguges nd Compilers Bottom Up Prsing Bottom-up Prsing: Shift-reduce prsing Grmmr H: fi ; fi b Input: ;;b hs prse tree ; ; b 2 Dt for Shift-reduce Prser Input string: sequence of tokens

More information

Expected Worst-case Performance of Hash Files

Expected Worst-case Performance of Hash Files Expected Worst-cse Performnce of Hsh Files Per-Ake Lrson Deprtment of Informtion Processing, Abo Akdemi, Fnriksgtn, SF-00 ABO 0, Finlnd The following problem is studied: consider hshfilend the longest

More information

Pointwise convergence need not behave well with respect to standard properties such as continuity.

Pointwise convergence need not behave well with respect to standard properties such as continuity. Chpter 3 Uniform Convergence Lecture 9 Sequences of functions re of gret importnce in mny res of pure nd pplied mthemtics, nd their properties cn often be studied in the context of metric spces, s in Exmples

More information

Vulnerability Analysis of Electric Power Communication Network. Yucong Wu

Vulnerability Analysis of Electric Power Communication Network. Yucong Wu 2nd Interntionl Conference on Advnces in Mechnicl Engineering nd Industril Informtics (AMEII 2016 Vulnerbility Anlysis of Electric Power Communiction Network Yucong Wu Deprtment of Telecommunictions Engineering,

More information

A HYDRAULIC SIMULATOR FOR AN EXCAVATOR

A HYDRAULIC SIMULATOR FOR AN EXCAVATOR P-06 Proceedings of the 7th JFPS Interntionl Symposium on Fluid Power TOYAMA 008 September 5-8 008 A HYDRAULIC SIMULATOR FOR AN EXCAVATOR Soon-Kwng Kwon* Je-Jun Kim* Young-Mn Jung* Chn-Se Jung* Chng-Don

More information

MATH 2530: WORKSHEET 7. x 2 y dz dy dx =

MATH 2530: WORKSHEET 7. x 2 y dz dy dx = MATH 253: WORKSHT 7 () Wrm-up: () Review: polr coordintes, integrls involving polr coordintes, triple Riemnn sums, triple integrls, the pplictions of triple integrls (especilly to volume), nd cylindricl

More information

Computer-Aided Multiscale Modelling for Chemical Process Engineering

Computer-Aided Multiscale Modelling for Chemical Process Engineering 17 th Europen Symposium on Computer Aided Process Engineesing ESCAPE17 V. Plesu nd P.S. Agchi (Editors) 2007 Elsevier B.V. All rights reserved. 1 Computer-Aided Multiscle Modelling for Chemicl Process

More information

Engineer To Engineer Note

Engineer To Engineer Note Engineer To Engineer Note EE-188 Technicl Notes on using Anlog Devices' DSP components nd development tools Contct our technicl support by phone: (800) ANALOG-D or e-mil: dsp.support@nlog.com Or visit

More information

CPSC 213. Polymorphism. Introduction to Computer Systems. Readings for Next Two Lectures. Back to Procedure Calls

CPSC 213. Polymorphism. Introduction to Computer Systems. Readings for Next Two Lectures. Back to Procedure Calls Redings for Next Two Lectures Text CPSC 213 Switch Sttements, Understnding Pointers - 2nd ed: 3.6.7, 3.10-1st ed: 3.6.6, 3.11 Introduction to Computer Systems Unit 1f Dynmic Control Flow Polymorphism nd

More information

10.5 Graphing Quadratic Functions

10.5 Graphing Quadratic Functions 0.5 Grphing Qudrtic Functions Now tht we cn solve qudrtic equtions, we wnt to lern how to grph the function ssocited with the qudrtic eqution. We cll this the qudrtic function. Grphs of Qudrtic Functions

More information

DQL: A New Updating Strategy for Reinforcement Learning Based on Q-Learning

DQL: A New Updating Strategy for Reinforcement Learning Based on Q-Learning DQL: A New Updting Strtegy for Reinforcement Lerning Bsed on Q-Lerning Crlos E. Mrino 1 nd Edurdo F. Morles 2 1 Instituto Mexicno de Tecnologí del Agu, Pseo Cuhunáhuc 8532, Jiutepec, Morelos, 6255, MEXICO

More information

Mid-term exam. Scores. Fall term 2012 KAIST EE209 Programming Structures for EE. Thursday Oct 25, Student's name: Student ID:

Mid-term exam. Scores. Fall term 2012 KAIST EE209 Programming Structures for EE. Thursday Oct 25, Student's name: Student ID: Fll term 2012 KAIST EE209 Progrmming Structures for EE Mid-term exm Thursdy Oct 25, 2012 Student's nme: Student ID: The exm is closed book nd notes. Red the questions crefully nd focus your nswers on wht

More information

CSEP 573 Artificial Intelligence Winter 2016

CSEP 573 Artificial Intelligence Winter 2016 CSEP 573 Artificil Intelligence Winter 2016 Luke Zettlemoyer Problem Spces nd Serch slides from Dn Klein, Sturt Russell, Andrew Moore, Dn Weld, Pieter Abbeel, Ali Frhdi Outline Agents tht Pln Ahed Serch

More information

1. SEQUENCES INVOLVING EXPONENTIAL GROWTH (GEOMETRIC SEQUENCES)

1. SEQUENCES INVOLVING EXPONENTIAL GROWTH (GEOMETRIC SEQUENCES) Numbers nd Opertions, Algebr, nd Functions 45. SEQUENCES INVOLVING EXPONENTIAL GROWTH (GEOMETRIC SEQUENCES) In sequence of terms involving eponentil growth, which the testing service lso clls geometric

More information

12-B FRACTIONS AND DECIMALS

12-B FRACTIONS AND DECIMALS -B Frctions nd Decimls. () If ll four integers were negtive, their product would be positive, nd so could not equl one of them. If ll four integers were positive, their product would be much greter thn

More information

Comparative Study of Universities Web Structure Mining

Comparative Study of Universities Web Structure Mining Comprtive Study of Universities Web Structure Mining Z. Abdullh, A. R. Hmdn Abstrct This pper is ment to nlyze the rnking of University of Mlysi Terenggnu, UMT s website in the World Wide Web. There re

More information