Internationalizing Speech Synthesis

Similar documents
Introducing the VoiceXML Server

Multilingual Aspects in Speech and Multimodal Interfaces. Paolo Baggia Director of International Standards

Special Lecture (406) Spoken Language Dialog Systems Introduction to VoiceXML

Integrate Speech Technology for Hands-free Operation

The W3C Emotion Incubator Group

Department of Information, Evidence & Research. International Clinical Trials Registry Platform. ICTRP Search Portal Revisions Document

WAP-Speech: Deriving Synergy between WAP and the Spoken Dialog Interface

EVALITA 2009: Loquendo Spoken Dialog System

Apple Inc. US 6,587,904 US 6,618,785 US 6,636,914 US 6,639,918 US 6,718,497 US 6,831,928 US 6,842,805 US 6,865,632 US 6,944,705 US 6,985,981

Aristech Documentation

Using control sequences

Implementation of ASR4CRM : An Automated Speech- Enabled Customer Care Service System

Standards for Multimodal Interaction: Activities in the W3C Multimodal Interaction Working Group. Group

CIC Text to Speech Engines

2004 NASCIO Recognition Awards. Nomination Form

Voice Quality (Russian) Speed of Speech. Adjustable from -10 to 10 (with 0 being the optimal option with average speech speed)

EPUB - LIFCO DICTIONARY ENGLISH TO TAMIL DOWNLOAD

Loquendo TTS Director: The next generation prompt-authoring suite for creating, editing and checking prompts

Oracle Linux and Oracle VM Support Policies ~ Statement of Changes Effective Date: 20-April-2018

Text To Speech Engines for IC

Committee on WIPO Standards (CWS)

Manuel E. Maisog Partner

Supporting 1,000+ Languages?

Data formats for exchanging classifications UNSD

VoiceXML for Business Applications: A Survey

ISO INTERNATIONAL STANDARD. Language resources management Multilingual information framework

HPE Secur & HPE Secur Cloud

Example. Section: PS 709 Examples of Calculations of Reduced Hours of Work Last Revised: February 2017 Last Reviewed: February 2017 Next Review:

It Internationalized ti Domain Names W3C Track: An International Web

IMS Assessment Interoperability Update

IXPs and IPv6 in the Asia Pacific: An Update

EMMA: Extensible MultiModal Annotation

April 1, 2018 ATSC Attachment 1 Page 1 of 12 LG Electronics Inc.

Network Working Group Internet-Draft October 27, 2007 Intended status: Experimental Expires: April 29, 2008

mod_unimrcp About Compatibility

April 1, 2019 ATSC Attachment 1 Page 1 of 12 LG Electronics Inc.

Computer Grade 5. Unit: 1, 2 & 3 Total Periods 38 Lab 10 Months: April and May

Web & Automotive. Paris, April Dave Raggett

OSIG Change History Article

RLAT Rapid Language Adaptation Toolkit

Beta Draft. Java Speech Markup Language Specification. Version 0.5 August 28, 1997

Marketing Opportunities

Obsoletes: 2070, 1980, 1942, 1867, 1866 Category: Informational June 2000

AMS API Modifications

Keyboards for inputting Chinese Language: A study based on US Patents

Efforts and Lessons on Improving Usability and Acceptance of IDNs in China

/Internet Random Moment Sampling. STATE OF ALASKA Department of Health and Social Services Division of Public Assistance

Speech Applications. How do they work?

On the Potential of Web Services in Network Management

A NOVEL MECHANISM FOR MEDIA RESOURCE CONTROL IN SIP MOBILE NETWORKS

Table of Checkpoints for User Agent Accessibility Guidelines 1.0

Quarterly Programmatic Report (November, 2013 to February, 2014)

VoiceXML. Installation and Configuration Guide. Interactive Intelligence Customer Interaction Center (CIC) Version 2016 R4

Whole World OLIF. Version 3.0 of the Versatile Language Data Format

Multi-modal Web IBM Position

Grid Code Planner EU Code Modifications GC0100/101/102/104

SC22/WG20 N677 Date: May 12, 1999

Media Resource Control Protocol v2

Design Room ONE Release Notes

GMF 2.0 Europa Simultaneous Release

ISO/IEC JTC 1 Update. April 2018 Phil Wennblom, Chair

IEEE P1564 Voltage Sag Indices Task Force Meeting

Designing in Text-To-Speech Capability for Portable Devices

Progress Report on the RA II WIGOS Project to Develop Support for NMHSs in Satellite Data, Products and Training

Adopting HTML5 for Television: Next Steps

IB Event Calendar Please check regularly for updates Last Update: April 30, 2013

Presented by AI Yuxin, Programme Officer, CSAM

This work is licensed under the Creative Commons Attribution 4.0 International License. Page 1 of 10

Hands-Free Internet using Speech Recognition

Lecture-14 Lookup Functions

ONAP Release Planning

An OASIS White Paper. Open by Design. The Advantages of the OpenDocument Format (ODF) ##### D R A F T ##### By the OASIS ODF Adoption TC For OASIS

By submitting your content to Pregame magazine, you agree to the Contributor Terms as outlined at

BCI Principles & Criteria: Revision

Printed Circuit Board Development Automation

Create Swift mobile apps with IBM Watson services IBM Corporation

VoiceXML Tutorial. Felix Burkhardt, written around 2002

VClarity Voice Platform

Working Draft v0.3 October 21 st 2001

"The way of the world is meeting people through other people." Term paper prepared by: Nagesh Yedaroor E-Business Technology BCM SS-09

High Visibility Enforcement TN Grants Tip Sheets

CYBERTECH MIDWEST Indianapolis, Indiana

Hello. Welcome to Loqu8 ice Learn Chinese. Start understanding and learning Chinese from your mouse.

OASIS Organization for the Advancement of Structured Information Standards Emergency Management Technical Committee (EMTC)

Tamil Image Text to Speech Using Rasperry PI

W3C XG USDL Introduction

Advanced Syllabus 2007 Release Plan

Information for DIBP Officers 2015

2015 China Future Network Development and Innovation Forum Held

Deliverable D ASFE-DL: Semantics, Syntaxes and Stylistics (R3)

RPM International Inc. Hotline Instructions

Oracle Hardware and Systems Support Policies ~ Statement of Changes Effective Date: 20-April-2018

Payment Card Industry Data Security Standard (PCI DSS) Payment Application Data Security Standard (PA-DSS) Summary of 2012 Feedback

Spoken Document Retrieval (SDR) for Broadcast News in Indian Languages

TWNIC Chinese EAI Promotion ICANN /03/14

ETSI TS V7.3.1 ( ) Technical Specification

Temporal Information Access (Temporalia-2)

RFC Editor Reporting April 2013

WHOIS Studies Update. Liz Gasster

Inventions on using LDAP for different purposes- Part-3

Transcription:

Internationalizing Speech Synthesis Daniel C. Burnett, Vexeo Zhiwei Shuang, IBM Co-chairs of W3C SSML 1.1 Subgroup

Speech Synthesis Markup Language 1.0 The Speech Synthesis Markup Language 1.0 (SSML1.0), a W3C Recommendation since 2004, is designed to provide a rich, XML-based markup language for assisting the generation of synthetic speech in Web and other applications. The essential role of the markup language is to provide authors of synthesizable content a standard way to control aspects of speech such as pronunciation, volume, pitch, rate, etc. across different synthesis-capable platforms.

SSML needs to support more languages! It is estimated that within 3 years the World Wide Web will contain significantly more content from currently under-represented languages, e.g. Mandarin Chinese and Hindi. In order to make SSML more useful in current and emerging markets, more enhancements for handling those languages are required. Many other languages would also benefit from a new "international" version of SSML, and it would help spread the Web to places where it is not so readily accessible.

From SSML 1.0 to SSML 1.1 SSML 1.1 improves on W3C's SSML 1.0 Recommendation by adding support for more conventions and practices of the world's languages. Helps to disambiguate "word boundaries" in languages that do not use whitespace as a word boundary, including Chinese, Thai, and Japanese. SSML 1.1 allows references to language-specific pronunciation alphabets. It provides finer-grained control over lexicon activation and entry usage. It clarifies the relationship between the author's specified speaking voice and the language being spoken. Provides features to better integrate with existing and upcoming Speech Interface Framework specifications.

Word Element In SSML 1.0: p -paragraph, s -sentence In SSML 1.1: w element added to disambiguate "word boundaries"

Language-specific pronunciation alphabets In SSML 1.0: The only valid values for the alphabet attribute are "ipa" (see the next paragraph) and vendor-defined strings of the form "xorganization" or "x-organization-alphabet". However, ipa is not convenient or popular for Chinese & many other languages. In SSML 1.1 We are planning to create a registry which includes pronunciation alphabets to be used with "alphabet" attribute of "phoneme" in SSML.

Finer-grained control over lexicon In SSML 1.0: Reference one or more external pronunciation lexicon documents for whole SSML document. In SSML 1.1 lookup element to specify the lexicon

Voice Element In SSML 1.0 voice element is twisted with xml:lang, the change of language will probably change the voice. The output of Mix Language Text is uncontrollable. In SSML 1.1 Clarifies the relationship between the author's specified speaking voice and the language being spoken. Change of xml:lang should not change the voice! Example of SSML 1.1: <voice languages="en:zh,zh:zh"> I come from <lang xml:lang="zh-cn"> 中国 </lang>. </voice>

SSML 1.1 Participants Editors: Voxeo IBM Authors: Nuance Communications Loquendo France Telecom Tellme Chinese Academy of Sciences Panasonic Toshiba HP Chinese University of Hong Kong iflytek Other Contributors: InterVoice Related W3C working groups

SSML Workshop & Meetings Three Workshop to gather requirements: Beijing Workshop, 2-3 November, 2005 Greece Workshop, 30-31 May, 2006 India Workshop, 13-14 January, 2007 Nine Face to Face Meetings Most recent one: 9th f2f Meeting in Beijing, 22-24 April, 2008 Other 8 f2f Meeting from April 2006-2008

SSML 1.1 Working Draft Status First Public Working Draft (10 January, 2007) Second Working Draft (11 June, 2007) Third Working Draft (4 September, 2007) Fourth Working Draft (12 December, 2007) Fifth Working Draft (17 March, 2008)

Thank you!