A Collaborative Speech Transcription System for Live Streaming

Size: px

Start display at page:

Download "A Collaborative Speech Transcription System for Live Streaming"

Donald Lucas
5 years ago
Views:

1 ustream Yourscribe PodCastle Yourscribe 2.0 A Collaborative Speech Transcription System for Live Streaming Shunsuke Ukita, 1 Jun Ogata, 2 Masataka Goto 2 and Tetsunori Kobayashi 1 In this paper, we propose a real-time transcription system, Yourscribe, that enables anonymous users to collaboratively transcribe speech in live video streaming like ustream. In previous approaches, transcription by human was laborsome and transcription by speech recognition was error-prone. PodCastle was developed to overcome such errors of speech recognition by having anonymous users correct errors, but was not appropriate for real-time transcription. To use Yourscribe, each user can just voluntarily type a short phrase that was heard while enjoying the live video without any interruption. Those phrases from many users were aggregated at all times and matched with results of realtime speech recognition to automatically form the transcription. This is a new instance of our research approach, Speech Recognition Research ustream 1 2 Web ustream twitter 3 Web 1)2)3)4)5)6) 100% Web PodCastle 7)8)9) ) PodCastle 1 Waseda University 2 National Institute of Advanced Industrial Science and Technology (AIST) c 2011 Information Processing Society of Japan

2 Yourscribe Yourscribe 2 Yourscribe Yourscribe Yourscribe Web Yourscribe Web Web ustream ustream twitter Yourscribe ustream 1!!!! Yourscribe : 1 Yourscribe Yourscribe Yourscribe 2 c 2011 Information Processing Society of Japan

3 Yourscribe 3. Yourscribe Yourscribe Web Web Web Yourscribe 1 ( ) 3.2 t e t e T delay t e T delay () DP DP (HMM: ) HMM 3 c 2011 Information Processing Society of Japan

Web 11) ( ) 3.2.2 HMM t e T delay (HMM) HMM HMM HMM Viterbi ) 3.2.3 HMM 2 4.

26 PodCastle 12)13) CSJ 600 3000 1 16 tied-state cross-word triphone 39 PLP(12 PLP )

4 Web 11) ( ) HMM t e T delay (HMM) HMM HMM HMM Viterbi ) HMM ustream Web Web 3 A B C Web % 75% PodCastle 12)13) CSJ tied-state cross-word triphone 39 PLP(12 PLP ) CMLLR 14). Web N-gram 11) Web CSJ HMM CSJ 32 monophone triphone monophone 4 c 2011 Information Processing Society of Japan

5 2 1 () + A 77.88% 85.66% B 47.42% 73.00% C 31.90% 60.25% ( ) () 25% 50% 75% A 80.47% 82.65% 85.66% B 58.57% 65.43% 73.00% C 42.18% 52.53% 60.25% T delay 10 1 C 3 50% monophone HMM 75% 50% 25% 75% 50% 25% )16)1)2) 15)16) 17) 18) 19) 3)20) 4)5)6) 21) 16) ( ) 7) PodCastle PodCastle Yourscribe PodCastle Web Web Yourscribe 5 c 2011 Information Processing Society of Japan

6 1 torne 2 (twitter ) 6. Yourscribe Web Yourscribe 2.0 7)8)9) PodCastle 1) Chen, S.S., Eide, E.M., Gales, M.J., Gopinath, R.A., Kanevsky, D. and Olsen, P. A.: Recent Improvements to IBM s Speech Recognition System for Automatic Transcription of Broadcast News, Proc. ICASSP 99, Vol.1, pp (1999). 2) Woodland, P.C., Gales, M.J., Pye, D. and Young, S.J.: Broadcast News Transcription Using HTK, Proc. ICASSP 97, Vol.2, pp (1997). 3) Glass, J., Hazen, T.J., Cyphers, S., Malioutov, I., Huynh, D. and Barzilay, R.: Recent Progress in the MIT Spoken Lecture Processing Project, Proc. of Interspeech 2007, pp (2007). 4) Janin, A., Baron, D., Edwards, J., Ellis, D., Gelbart, D., Morgan, N., Peskin, B., Pfau, T., Shriberg, E., Stolcke, A. and Wooters, C.: The ICSI Meeting Corpus, Proc. ICASSP 2003, Vol.1, pp (2003). 5) Metze, F., Waibel, A., Bett, M., Ries, K., Schaaf, T., Schultz, T., Soltau, H., Yu, H. and Zechner, K.: Advances in Automatic Meeting Record Creation and Access, Proc. ICASSP 2001, Vol.1, pp (2001). 6) Yu, H., Clark, C., Malkin, R. and Waibel, A.: Experiments in Automatic Meeting Transcription Using JRTk, Proc. ICASSP 98, Vol.2, pp (1998). 7) PodCastle: Vol.25, No.1, pp (2010). 8) Goto, M., Ogata, J. and Eto, K.: PodCastle: A Web 2.0 Approach to Speech Recognition Research, Proc. of Interspeech 2007, pp (2007). 9) Ogata, J. and Goto, M.: PodCastle: Collaborative Training of Acoustic Models on the Basis of Wisdom of Crowds for Podcast Transcription, Proc. of Interspeech 2009, pp (2009). 10) Goto, M. and Ogata, J.: PodCastle: A Spoken Document Retrieval Service Improved by Anonymous User Contributions, Proc. of PACLIC 24, pp.3 11 (2010). 11) PodCastle: Web pp (2008). 12) Ogata, J., Goto, M. and Eto, K.: Automatic Transcription for a Web 2.0 Service to Search Podcasts, Proc. of Interspeech 2007, pp (2007). 13) PodCastle: 2010-SLP-84-2 (2010). 14) Gales, M. J.F.: Maximal likelihood linear transformations for HMM-Based speech recognition, Computer Speech & Language, Vol.12, pp (1998). 15) 2 pp (2008). 16) (D-II) Vol.J84-D-II, pp (2001). 17) Akita, Y., Mimura, M. and Kawahara, T.: Automatic Transcription System for Meetings of the Japanese National Congress, Proc. of Interspeech 2009 (2009). 18) Neubig, G SLP-84-3 (2010). 19) Kawahara, T., Nanjo, H., Shinozaki, T. and Furui, S.: Benchmark test for speech recognition using the corpus of spontaneous japanese, Proc. SSPR 2003 (2003). 20) 2 pp.7 14 (2008). 21) Ogata, J. and Goto, M.: Speech Repair: Quick Error Correction Just by Using Selection Operation for Speech Input Interfaces, Proc. of Interspeech 2005, pp (2005). 6 c 2011 Information Processing Society of Japan

PodCastle: A Spoken Document Retrieval Service Improved by Anonymous User Contributions

PACLIC 24 Proceedings 3 PodCastle: A Spoken Document Retrieval Service Improved by Anonymous User Contributions Masataka Goto and Jun Ogata National Institute of Advanced Industrial Science and Technology