ISO/IEC JTC/1 SC/2 WG/2 N2312. ISO/IEC JTC/1 SC/2 WG/2 Universal Multiple-Octet Coded Character Set (UCS)

Similar documents
ZWJ requests that glyphs in the highest available category be used; ZWNJ requests that glyphs in the lowest available category be used.

Extensible Rendering for Complex Writing Systems

****This proposal has not been submitted**** ***This document is displayed for initial feedback only*** ***This proposal is currently incomplete***

Introduction. Acknowledgements

AFP Support for TrueType/Open Type Fonts and Unicode

ISO/IEC JTC 1/SC 2/WG 2/N2789 L2/04-224

Proposal to Encode Phonetic Symbols with Retroflex Hook in the UCS

ISO/IEC JTC 1/SC 2/WG 2 PROPOSAL SUMMARY FORM TO ACCOMPANY SUBMISSIONS FOR ADDITIONS TO THE REPERTOIRE OF ISO/IEC

Form number: N2352-F (Original ; Revised , , , , , , ) N2352-F Page 1 of 7

Proposal to encode three Arabic characters for Arwi

5 U+1F641 SLIGHTLY FROWNING FACE

Proposal to Encode Oriya Fraction Signs in ISO/IEC 10646

A B Ɓ C D Ɗ Dz E F G H I J K Ƙ L M N Ɲ Ŋ O P R S T Ts Ts' U W X Y Z

5 U+1F641 SLIGHTLY FROWNING FACE

Bookmarks for PDF Output(Outline-Group)

ISO/IEC JTC1/SC2/WG2. Universal Multiple-Octet Coded Character Set (UCS) - ISO/IEC Secretariat: ANSI

Blending Content for South Asian Language Pedagogy Part 2: South Asian Languages on the Internet

Yes. Form number: N2652-F (Original ; Revised , , , , , , , )

ISO/IEC JTC1/SC2/WG2 N4445

ISO/IEC INTERNATIONAL STANDARD. Information technology Coding of audio-visual objects Part 22: Open Font Format

5c. Are the character shapes attached in a legible form suitable for review?

Proposal to encode Devanagari Sign High Spacing Dot

ISO/IEC JTC/1 SC/2 WG/2 N2095

ISO/IEC AMENDMENT

Proposal to encode fourteen Pakistani Quranic marks

L2/ ISO/IEC JTC1/SC2/WG2 N4671

Universal Multiple-Octet Coded Character Set UCS

Two distinct code points: DECIMAL SEPARATOR and FULL STOP

A. Administrative. B. Technical General

1. Introduction 2. TAMIL DIGIT ZERO JTC1/SC2/WG2 N Character proposed in this document About INFITT and INFITT WG

JTC1/SC2/WG2 N

Universal Multiple-Octet Coded Character Set UCS

ISO/IEC JTC1/SC2/WG2 N2641

1. Introduction 2. TAMIL LETTER SHA Character proposed in this document About INFITT and INFITT WG

Bringing ᬅᬓᬱᬭᬩᬮ to ios. Norbert Lindenberg

ISO/IEC INTERNATIONAL STANDARD. Information technology Coding of audio-visual objects Part 18: Font compression and streaming

SIL ViewGlyph Font Viewing Program

ISO/IEC JTC1/SC2/WG2 N4078

ISO/IEC JTC 1/SC 2/WG 2 PROPOSAL SUMMARY FORM TO ACCOMPANY SUBMISSIONS FOR ADDITIONS TO THE REPERTOIRE OF ISO/IEC / UNICODE

If this proposal is adopted, the following three characters would exist: with the following properties (including a change from Ll to Lo for U+0294):

Introduction. Requests. Background. New Arabic block. The missing characters

Üù àõ [tai 2 l 6] (in older orthography Üù àõ»). Tai Le orthography is simple and straightforward:

1.1 The digit forms propagated by the Dozenal Society of Great Britain

ISO/IEC JTC1/SC2/WG2 N3734 L2/09-144R

1 ISO/IEC JTC1/SC2/WG2 N

1. Introduction. 2. Encoding Considerations

Keyman, LANGIDs & Codepages

0CE2 Ä KANNADA VOWEL SIGN VOCALIC L 0CE3 Å KANNADA VOWEL SIGN VOCALIC LL 0CE4 Ç KANNADA DANDA 0CE5 É KANNADA DOUBLE DANDA

Title: Application to include Arabic alphabet shapes to Arabic 0600 Unicode character set

Revised proposal to encode NEPTUNE FORM TWO. Eduardo Marin Silva 02/08/2017

Proposal to encode the DOGRA VOWEL SIGN VOCALIC RR

B. Technical General 1. Choose one of the following: 1a. This proposal is for a new script (set of characters) Yes.

Andrew Glass and Shriramana Sharma. anglass-at-microsoft-dot-com jamadagni-at-gmail-dot-com November-2

The Unicode Standard Version 11.0 Core Specification

JTC1/SC2/WG2 N4945R

ISO/IEC JTC1/SC2/WG2 N 2490

Proposal to encode the SANDHI MARK for Newa

transcribing Urdu or Arabic words. Accordingly, the KHHA and GHHA should be considered atomic, as Tibetan TSA, TSHA, and DZA are.

The Adobe-Japan1-6 Character Collection

ISO/IEC JTC 1/SC 2/WG 2 N3086 PROPOSAL SUMMARY FORM TO ACCOMPANY SUBMISSIONS 1

ISO/IEC JTC/SC2/WG Universal Multiple Octet Coded Character Set (UCS)

A. Administrative. B. Technical General L2/ DATE:

FLT: Font Layout Table

097B Ä DEVANAGARI LETTER GGA 097C Å DEVANAGARI LETTER JJA 097E Ç DEVANAGARI LETTER DDDA 097F É DEVANAGARI LETTER BBA

1. Introduction. 2. Proposed Characters Block: Latin Extended-E Historic letters for Sakha (Yakut)

Use of ZWJ/ZWNJ with Mongolian Variant Selectors and Vowel Separator SOURCE: Paul Nelson and Asmus Freytag STATUS: Proposal

Extending SVG Fonts with Graphite

What s new since TEX?

ISO International Organization for Standardization Organisation Internationale de Normalisation

Proposal to encode Kannada Sign Spacing Candrabindu

ISO/IEC INTERNATIONAL STANDARD

ISO/IEC JTC 1/SC 2 N 3332/WG2 N 2057

TECkit version 2.0 A Text Encoding Conversion toolkit

ISO/IEC JTC 1/SC 2/WG 2 Proposal summary form N2652-F accompanies this document.

Proposal to Encode the Ganda Currency Mark for Bengali in the BMP of the UCS

DATE: Time: 12:28 AM N SupportN2621 Page: 1 of 9 ISO/IEC JTC 1/SC 2/WG 2

John H. Jenkins If available now, identify source(s) for the font (include address, , ftp-site, etc.) and indicate the tools used:

Recent Trends in Standardization of Japanese Character Codes

INTERNATIONALIZATION IN GVIM

ISO/IEC JTC1/SC2/WG2 N4599 L2/

Information technology Keyboard layouts for text and office systems. Part 9: Multi-lingual, multiscript keyboard layouts

Software Applications for Cultural Diversity

ISO/IEC JTC1/SC2/WG2 N 3476

Communication through the language barrier in some particular circumstances by means of encoded localizable sentences

OpenType Font by Harsha Wijayawardhana UCSC

Fig. 2 of E.161 Fig. 3 of E.161 Fig. 4 of E.161

JTC1/SC2/WG

ISO/IEC JTC 1/SC 2 N 3354

A. Administrative. B. Technical General ISO/IEC JTC1/SC2/WG2 N1948

Reply to L2/10-327: Comments on L2/10-280, Proposal to Add Variation Sequences... 1

Comments on responses to objections provided in N2661

Yes 11) 1 Form number: N2652-F (Original ; Revised , , , , , , , 2003-

Proposal to encode ADLAM LETTER APOSTROPHE for ADLaM script

Rendering in Dzongkha

UNICODE IDEOGRAPHIC VARIATION DATABASE

The Adobe-CNS1-6 Character Collection

ISO/IEC JTC1/SC2 Nxxxx ISO/IEC JTC1/SC2/WG2/IRG N2214

PROPOSAL SUMMARY FORM TO ACCOMPANY SUBMISSIONS FOR ADDITIONS TO THE REPERTOIRE OF ISO/IEC / UNICODE

ä + ñ ISO/IEC JTC1/SC2/WG2 N

Proposal to Encode An Outstanding Early Cyrillic Character in Unicode

Transcription:

ISO/IEC JTC/1 SC/2 WG/2 N2312 L2/01-025 2001-01-08 ISO/IEC JTC/1 SC/2 WG/2 Universal Multiple-Octet Coded Character Set (UCS) Title: Presentation of tone contours encoded as UCS tone letter sequences Doc. Type: Expert contribution Source: Peter Constable, SIL International Action: For consideration by WG2 References: WG2 N2307, N2195 Distribution: WG2 members Introduction Concern was expressed in section 3.3 of WG2 N2195 that the existing tone letter characters in the UCS (U+02E5..U+02E9) are not adequate to represent contour tones, such as high-rising, that occur in languages such as Cantonese. N2307 explains that the existing tone letter characters are, in fact, adequate for representing contour tones by the use of tone letter sequences. For example, a highrising tone (a 45 contour) can be represented using a sequence of characters 02E6+02E5. The concern expressed in N2195 appears to be motivated not by the need for an encoded representation of contour tones, but by the problem of appropriate presentation of an encoded representation using glyphs with contour shapes. This is a prototypical case of the distinction between characters and glyphs, as asserted by ISO/IEC TR 15285, and of the need for information systems to provide complex character sequence-to-glyph mappings as specified in TR 15285. I wish to support the opinion expressed in N2307, that the existing tone letter characters 02E5..02E9 are adequate for representing tone contours, by describing an existing implementation that is capable of presenting contour tone glyphs that are mapped from UCS tone letter sequences. Existing implementation: SIL Graphite system and IPA fonts SIL International has for several years distributed IPA fonts from our web site. These fonts were designed using to work with older technologies that assumed one-to-one character- to-glyph mappings (following the coded font model of TR 15285) and used a proprietary encoding. We are currently engaged in a new implementation of IPA fonts that conforms to the UCS encoding and that makes use of complex character-to-glyph mapping technologies (following the intelligent font model of TR 15285). In our initial implementation, complex character-to-glyph mapping has been implemented using the SIL Graphite font rendering system. The current implementation of this system provides a generic character-to-glyph mapping engine that is packaged as a COM object, which can be called by client application software. The engine makes use of finite-state tables found in a Graphite-enabled font. Such a font is created as an extension of the TrueType font format by adding the additional tables used by the Graphite engine. These additional tables define the script-specific mapping behaviour and are compiled from a high-level description of the script behaviour, written in the Graphite Description Language (GDL). (Complete information on the Graphite rendering system is available at http://www.sil.org/computing/graphite/.) The GDL code used for handling the presentation of 1

contour tones is provided below in Annex A. In this implementation, sequences of tone letter characters get mapped to single glyphs for the particular contour. This is exactly the same type of mapping that might be used for Arabic ligatures or Devanagari conjuncts. Figure 1 shows a screen shot from an application using the Graphite rendering system. It contains two lines of text. Both lines of text contain exactly the same tone letter characters from the range 02E5..02E9 and encoded in UTF-16. The only difference is that, in the first line, the tone letters are separated by space characters to prevent the selection of the ligated contour glyphs. Figure 1. Tone letter sequences rendered as contour tone glyphs. Note that contiguous sequences of tone letter characters are mapped to the appropriate tone contour glyphs. This is done without requiring any specific user action and without any additional markup or other information beyond the plain text. This font implementation will support contours corresponding to any combination of tone letters up to three characters long. Figure 2 shows a screen shot of a utility that can view the glyph palette of a TrueType font. In this screen shot, the complete inventory of contour glyphs available in the IPA font is shown. 2

Figure 2: inventory of contour tone glyphs in SIL IPA font All of these tone glyphs can be presented using the Graphite system from an underlying encoded character representation using only the tone letter characters 02E5..02E9. Note that Graphite is not the only system that can support a font implementation to handle presentation of contour tones. Apple Computer s AAT technology and Microsoft / Adobe s OpenType technology are equally capable of supporting similar implementations. We are considering the possibility of implementing our IPA fonts using these technologies in addition to the Graphite rendering technology in the future. Recommendations The existence of a working implementation demonstrates that the existing tone letter characters 02E5..02E9 are adequate for encoded representation and presentation of contour tones. It is recommended that presentation forms for contour tone letters not be added to the UCS. Rather, the needs of presentation of contour tone letters, and of many non-latin scripts, would be better served 3

by efforts to promote the development and adoption of technologies such as Graphite, AAT and OpenType that implement the intelligent font model described in TR 15285. Annex A: GDL code for presentation of contour tones The GDL code (pre-release version) for handling the presentation of contour tones is presented here. There are two parts to the code: a glyph table, which is used to define symbols for refering to individual glyphs or classes of glyphs, and a substitution table, which provides the actual mapping. (Note that an initial mapping from character codes to intial glyph identifiers is provided by the cmap table in the font. The GDL mappings are done entirely in terms of glyphs.) Only the classes defined in the glyph table have been duplicated here. For every symbolic glyph name used in defining the classes, a definition of that name would also have been provided in the glyph table. For example, the symbolic name g02e802e9 is associated with a particular glyph from the font using the following GDL statement: g02e802e9 = glyphid(380); (A glyph ID is a unique 16-bit integer identifier used in the TrueType font format to reference glyphs in a font s glyph palette.) The complete set of such rules has been suppressed here to conserve space. Only the definitions of glyph classes are shown here. The basic effect of a substitution rule such as cpitches g02e9 > _ cpitch0$1:(1 2) / _ ^ _; is this: whenever a glyph in the class cpitches is followed by the glyph g02e9 (which happens to be the default glyph mapped in the cmap table from the character U+02E9), this sequence of two glyphs is substituted by a glyph from the class cpitch0. The operator $1 specifies that the glyph to be selected from cpitch0 is determined by the index that was used for matching the glyph in the first position of the sequence. (More generally, $n would use the index applied to the nth position in the glyph sequence.) Thus, for example, if the glyph in the first position of the sequence matched the fifth glyph in the class cpitches, then the output glyph would be the fifth glyph in cpitch0. The remaining aspects of this substitution rule and the complete details on the GDL language are available at http://www.sil.org/computing/graphite/gdl.pdf. Rules related to tone contours, excerpted from SILDU.GDL: table(glyph) {MUnits = 1000}; //pitch glyph classes cpitch0 = (g02e9, g02e902e802e9, g02e902e702e9, g02e902e602e9, g02e902e502e9, g02e802e9, g02e802e902e9, g02e802e702e9, g02e802e602e9, g02e802e502e9, g02e702e9, g02e702e902e9, g02e702e9, g02e702e602e9, g02e702e502e9, g02e602e9, g02e602e902e9, g02e602e802e9, g02e602e702e9, g02e602e502e9, g02e502e9, g02e502e902e9, g02e502e802e9, g02e502e9, g02e502e602e9); cpitch1 = (g02e902e8, g02e902e802e8, g02e902e702e8, g02e902e602e8, g02e902e502e8, g02e8, g02e802e902e8, g02e802e702e8, 4

g02e802e602e8, g02e802e502e8, g02e702e8, g02e702e902e8, g02e702e802e8, g02e702e602e8, g02e702e502e8, g02e602e8, g02e602e902e8, g02e602e802e8, g02e602e8, g02e602e502e8, g02e502e8, g02e502e902e8, g02e502e802e8, g02e502e702e8, g02e502e602e8); cpitch2 = (g02e902e7, g02e902e7, g02e902e702e7, g02e902e602e7, g02e902e502e7, g02e802e7, g02e802e902e7, g02e802e702e7, g02e802e602e7, g02e802e502e7, g02e7, g02e702e902e7, g02e702e802e7, g02e702e602e7, g02e702e502e7, g02e602e7, g02e602e902e7, g02e602e802e7, g02e602e702e7, g02e602e502e7, g02e502e7, g02e502e902e7, g02e502e802e7, g02e502e702e7, g02e502e7); cpitch3 = (g02e902e6, g02e902e802e6, g02e902e702e6, g02e902e602e6, g02e902e502e6, g02e802e6, g02e802e902e6, g02e802e6, g02e802e602e6, g02e802e502e6, g02e702e6, g02e702e902e6, g02e702e802e6, g02e702e602e6, g02e702e502e6, g02e6, g02e602e902e6, g02e602e802e6, g02e602e702e6, g02e602e502e6, g02e502e6, g02e502e902e6, g02e502e802e6, g02e502e702e6, g02e502e602e6); cpitch4 = (g02e902e5, g02e902e802e5, g02e902e5, g02e902e602e5, g02e902e502e5, g02e802e5, g02e802e902e5, g02e802e702e5, g02e802e602e5, g02e802e502e5, g02e702e5, g02e702e902e5, g02e702e802e5, g02e702e5, g02e702e502e5, g02e602e5, g02e602e902e5, g02e602e802e5, g02e602e702e5, g02e602e502e5, g02e5, g02e502e902e5, g02e502e802e5, g02e502e702e5, g02e502e602e5); cpitches = (g02e9, g02e902e8, g02e902e7, g02e902e6, g02e902e5, g02e8, g02e802e9, g02e802e7, g02e802e6, g02e802e5, g02e7, g02e702e9, g02e702e8, g02e702e6, g02e702e5, g02e6, g02e602e9, g02e602e8, g02e602e7, g02e602e5, g02e5, g02e502e9, g02e502e8, g02e502e7, g02e502e6); endtable table(substitution); // Handle pitch character ligating cpitches g02e9 > _ cpitch0$1:(1 2) / _ ^ _; cpitches g02e8 > _ cpitch1$1:(1 2) / _ ^ _; cpitches g02e7 > _ cpitch2$1:(1 2) / _ ^ _; cpitches g02e6 > _ cpitch3$1:(1 2) / _ ^ _; cpitches g02e5 > _ cpitch4$1:(1 2) / _ ^ _; endtable; 5