PROPOSAL SUMMARY FORM TO ACCOMPANY SUBMISSIONS FOR ADDITIONS TO THE REPERTOIRE OF ISO/IEC / UNICODE

Similar documents
ISO/IEC JTC 1/SC 2/WG 2 PROPOSAL SUMMARY FORM TO ACCOMPANY SUBMISSIONS FOR ADDITIONS TO THE REPERTOIRE OF ISO/IEC / UNICODE

A. Administrative. B. Technical General L2/ DATE:

Form number: N2352-F (Original ; Revised , , , , , , ) N2352-F Page 1 of 7

****This proposal has not been submitted**** ***This document is displayed for initial feedback only*** ***This proposal is currently incomplete***

ISO/IEC JTC 1/SC 2/WG 2 PROPOSAL SUMMARY FORM TO ACCOMPANY SUBMISSIONS FOR ADDITIONS TO THE REPERTOIRE OF ISO/IEC

ISO/IEC JTC1/SC2/WG2 N2817

Proposal to encode the DOGRA VOWEL SIGN VOCALIC RR

Proposal to encode MALAYALAM LETTER VEDIC ANUSVARA

Proposal to encode three Arabic characters for Arwi

Introduction. Acknowledgements

Proposal to Encode the Ganda Currency Mark for Bengali in the BMP of the UCS

ISO/IEC JTC 1/SC 2/WG 2/N2789 L2/04-224

Andrew Glass and Shriramana Sharma. anglass-at-microsoft-dot-com jamadagni-at-gmail-dot-com November-2

John H. Jenkins If available now, identify source(s) for the font (include address, , ftp-site, etc.) and indicate the tools used:

ISO/IEC JTC1/SC2/WG2 N2641

Proposal to encode the END OF TEXT MARK for Malayalam

Proposal to encode the SANDHI MARK for Newa

Revised proposal to encode NEPTUNE FORM TWO. Eduardo Marin Silva 02/08/2017

Yes. Form number: N2652-F (Original ; Revised , , , , , , , )

Proposal to Encode Oriya Fraction Signs in ISO/IEC 10646

Proposal to encode Devanagari Sign High Spacing Dot

B. Technical General 1. Choose one of the following: 1a. This proposal is for a new script (set of characters) Yes.

1. Introduction 2. TAMIL DIGIT ZERO JTC1/SC2/WG2 N Character proposed in this document About INFITT and INFITT WG

Title: Application to include Arabic alphabet shapes to Arabic 0600 Unicode character set

ISO/IEC JTC 1/SC 2/WG 2 PROPOSAL SUMMARY FORM TO ACCOMPANY SUBMISSIONS. Yes (or) More information will be provided later:

0CE2 Ä KANNADA VOWEL SIGN VOCALIC L 0CE3 Å KANNADA VOWEL SIGN VOCALIC LL 0CE4 Ç KANNADA DANDA 0CE5 É KANNADA DOUBLE DANDA

Proposal to Add Four SENĆOŦEN Latin Charaters

Proposal to encode Kannada Sign Spacing Candrabindu

A B Ɓ C D Ɗ Dz E F G H I J K Ƙ L M N Ɲ Ŋ O P R S T Ts Ts' U W X Y Z

Structure Vowel signs are used in a manner similar to that employed by other Brahmi-derived scripts. Consonants have an inherent /a/ vowel sound.

ISO International Organization for Standardization Organisation Internationale de Normalisation

ISO/IEC JTC1/SC2/WG2. Universal Multiple-Octet Coded Character Set (UCS) - ISO/IEC Secretariat: ANSI

UC Irvine Unicode Project

ä + ñ ISO/IEC JTC1/SC2/WG2 N

5c. Are the character shapes attached in a legible form suitable for review?

Cibu Johny, 2014-Dec-26

1. Introduction.

Ꞑ A790 LATIN CAPITAL LETTER A WITH SPIRITUS LENIS ꞑ A791 LATIN SMALL LETTER A WITH SPIRITUS LENIS

ISO/IEC JTC1/SC2/WG2 N4157 L2/12-121

Proposal to Encode An Outstanding Early Cyrillic Character in Unicode

1 ISO/IEC JTC1/SC2/WG2 N

ISO/IEC JTC1/SC2/WG2 N4445

Proposal to encode the TAKRI VOWEL SIGN VOCALIC R

ISO/IEC JTC 1/SC 2/WG 2 PROPOSAL SUMMARY FORM TO ACCOMPANY SUBMISSIONS FOR ADDITIONS TO THE REPERTOIRE OF ISO/IEC

JTC1/SC2/WG2 N3514. Proposal to encode a SOCCER BALL symbol Karl Pentzlin, U+26xx SOCCER BALL

Proposal to Encode Phonetic Symbols with Retroflex Hook in the UCS

Proposal to encode MALAYALAM SIGN PARA

ISO/IEC JTC 1/SC 2/WG 2 Proposal summary form N2652-F accompanies this document.

1. Introduction 2. TAMIL LETTER SHA Character proposed in this document About INFITT and INFITT WG

JTC1/SC2/WG2 N

ISO/IEC JTC1/SC2/WG2 N3734 L2/09-144R

JTC1/SC2/WG

The solid pentagram, as found in the top row of the figure above, is also used in mysticism and occult contexts (see figures in section 6).

Two distinct code points: DECIMAL SEPARATOR and FULL STOP

L2/ ISO/IEC JTC1/SC2/WG2 N4671

Request for encoding GRANTHA LENGTH MARK

5 U+1F641 SLIGHTLY FROWNING FACE

Request for encoding 1CF3 ROTATED ARDHAVISARGA

Proposal to Encode Some Outstanding Early Cyrillic Characters in Unicode

Universal Multiple-Octet Coded Character Set International Organization for Standardization Organisation Internationale de Normalisation

Proposal to encode the Prishthamatra for Nandinagari

L2/ ISO/IEC JTC 1/SC 2/WG 2 PROPOSAL SUMMARY FORM TO ACCOMPANY SUBMISSIONS FOR ADDITIONS TO THE REPERTOIRE OF ISO/IEC

ISO/IEC JTC 1/SC 2/WG 2 N3086 PROPOSAL SUMMARY FORM TO ACCOMPANY SUBMISSIONS 1

5 U+1F641 SLIGHTLY FROWNING FACE

Proposal to Encode an Abbreviation Sign for Bengali. Srinidhi A and Sridatta A. Tumakuru, Karnataka, India

JTC1/SC2/WG2 N3048. Form number: N2352-F (Original ; Revised , , , , , , ) Page 1 of 5

ISO/IEC JTC 1/SC 2/WG 2 PROPOSAL SUMMARY FORM TO ACCOMPANY SUBMISSIONS FOR ADDITIONS TO THE REPERTOIRE OF ISO/IEC

ARABIC LETTER BEH WITH SMALL ARABIC LETTER MEEM ABOVE MBEH (BEH WITH MEEM ABOVE) Represents mba sound

097B Ä DEVANAGARI LETTER GGA 097C Å DEVANAGARI LETTER JJA 097E Ç DEVANAGARI LETTER DDDA 097F É DEVANAGARI LETTER BBA

Proposal to encode fourteen Pakistani Quranic marks

Universal Multiple-Octet Coded Character Set International Organization for Standardization Organisation Internationale de Normalisation

1. Introduction. 2. Encoding Considerations

Doc Type: Working Group Document Title: Recommendations for encoding Xiàngqí game symbols Source: Michael Everson (

YES (or) More information will be provided later:

ISO/IEC JTC1/SC2/WG2 N4078

Üù àõ [tai 2 l 6] (in older orthography Üù àõ»). Tai Le orthography is simple and straightforward:

Request for encoding 1CF4 VEDIC TONE CANDRA ABOVE

ISO/IEC JTC/SC2/WG Universal Multiple Octet Coded Character Set (UCS)

Introduction. Requests. Background. New Arabic block. The missing characters

PHOENICIAN NUMBER ONE. More recent research into Imperial Aramaic (N3273R2, L2/07-197R2, 2007-

Proposal to encode the TELUGU SIGN SIDDHAM

Yes 11) 1 Form number: N2652-F (Original ; Revised , , , , , , , 2003-

transcribing Urdu or Arabic words. Accordingly, the KHHA and GHHA should be considered atomic, as Tibetan TSA, TSHA, and DZA are.

See also:

2. Requester's name: Urdu and Regional Language Software Development Forum, Ministry of Science and Technology, Government of Pakistan

ISO/IEC JTC1/SC2/WG2 N4599 L2/

ISO/IEC JTC1/SC2/WG2 N4210

ISO/IEC JTC 1/SC 2/WG 2 Universal Multiple-Octet Coded Character Set (UCS)

L2/ Proposal to encode archaic vowel signs O OO for Kannada. 1. Thanks. 2. Introduction

1. Introduction. 2. Proposed Characters Block: Latin Extended-E Historic letters for Sakha (Yakut)

JTC1/SC2/WG2 N4945R

1.1 The digit forms propagated by the Dozenal Society of Great Britain

Although the International Phonetic Alphabet deprecated the letters. Unicode Character Properties. Character properties are proposed here.

Title: A proposal to encode the Akarmatrik music notation symbols in UCS. Author: Chandan Misra. Submission Date:

ISO/IEC JTC1 SC2 WG2 N3997

ISO/IEC JTC/1 SC/2 WG/2 N2095

Proposal to encode. 1. Discussion

also represented by combnining vowel matras with ē, and ō: ayɯ, eyi, ayi;

ISO/IEC JTC 1/SC 2/WG 2 PROPOSAL SUMMARY FORM TO ACCOMPANY SUBMISSIONS FOR ADDITIONS TO THE REPERTOIRE OF ISO/IEC A.

Form number: N2652-F (Original ; Revised , , , , , , , )

Transcription:

PROPOSAL SUMMARY FORM TO ACCOMPANY SUBMISSIONS FOR ADDITIONS TO THE REPERTOIRE OF ISO/IEC 10646 / UNICODE A. Administrative 1. Title: Revised Proposal for Encoding Syloti Nagri Script in the BMP 2. Requesters names: Peter Constable (SIL International); James Lloyd-Williams and Sue Lloyd-Williams (Sylheti Translation And Research, London); Advocate Shamsul Islam Chowdhury (Chairman, Sylot Academy, Sylhet, Bangladesh); Professor Asaddar Ali (Vice Chairman, Sylot Academy, Sylhet, Bangladesh); Mohammed Sadique; Matiar Rahman Chowdhury (Chairman, Sylot Academy (UK and Europe), London). 3. Requester type: Expert contribution 4. Submission date: 5. Requester s reference Prior L2/UTC documents L2/02-387, L2/02-388, L2/03-146r. (if applicable): 6a. Completion: This is a complete proposal 6b. More information to See referenced document in item 5 be provided B. Technical General 1a. New script? Name? Syloti Nagri. 1b. Addition of characters to existing block? Name? No. 2. Number of characters 45 3. Proposed category Category A 4. Proposed level of implementation and rationale Level 3. Syloti Nagri script contains combining diacritics. 5a. Character names included in proposal? 5b. Character names in accordance with guidelines? 5c. Character shapes attached in a reviewable form? Page 1

6a. Who will provide computerized font? Sue Lloyd-Williams (Sylheti Translation And Research) 6b. Font currently available? 6c. Font format? TrueType 7a. Are references (to other character sets, dictionaries, descriptive texts, etc) provided? 7b. Are published examples (such as samples from newspapers, magazines, or other sources) of use of proposed characters attached? 8. Does the proposal address other aspects of character data processing? C. Technical Justification 1. Has this proposal for addition of character(s) been submitted before? This is a revision resulting from a request by UTC 93 (action item 93-A102) to reconsider specific issues and make specific revisions. 2. Contact with the user community? 3. Information on the user community for the proposed characters (for example: size, demographics, information technology use, or publishing use) is included? Sizeable communities in Sylhet region of Bangladesh, in Calcutta, in England and elsewhere. 3b. Reference 4a. The context of use for the proposed characters (type of use; common or rare)? Publishing in the script was known to have been done by several presses in Bangladesh and Calcutta as recently as the 1970s. More recently, no metal-type facilities have been available, but development of digital type has resulted in renewed use and growing interest. 4b.Reference 5. Are the proposed characters in current use by the user community? 5b. Where? In London and Birmingham, and in Sylhet, possibly elsewhere. 5c. Reference Page 2

6. After giving due considerations to the principles in N 1352 must the proposed characters be entirely in the BMP? 6b. Rationale Living script. 6c. Reference 7. Should the proposed characters be kept together in a contiguous range (rather than being scattered)? One possible exception is SYLOTI NAGRI SIGN FUL (U+xx28 in the chart): a similar character can be found in Bengali poetic texts, and thus it may make sense to unify this character across scripts. If unification were desired, then this character could be added to the General Punctuation block, as it is a punctuation character. 8. Can any of the proposed characters be considered a presentation form of an existing character or character sequence? No. 9a. Can any of the proposed character(s) be considered to be similar (in appearance or function) to an existing character? Syloti Nagri has similarities with other North Indic scripts, but is distinct; it is no more like any other Indic scripts already in the UCS than are any of them to one another. Thus, for purposes of this proposal, these characters cannot be considered similar to existing characters. 10a. Does the proposal include use of combining characters and/or use of composite sequences (see clause 4.11 and 4.13 in ISO/IEC 10646-1)? 10b. If YES, is a rationale for such use provided? 10c. Reference 10d. Is a list of composite sequences and their corresponding glyph images (graphic symbols) provided? 10e. Reference 11. Does the proposal contain characters with any special properties such as control function or similar semantics? If YES, describe in detail (include attachment if necessary). (See additional notes below and further discussion in L2/02-388 and L2/03-146r.) Page 3

D. Proposed Characters As discussed in section II.6 of L2/02-388, the basic collation order for Syloti Nagri is well established, but there are attested variations in ordering in relation to a few details. In the following chart, characters are ordered in what we believe to be the most preferable ordering. As discussed in section II.1.6 of L2/02-388, there is very limited attestation in manuscripts of a set of Syloti Nagri digits. The evidence is not sufficient to support a proposal to encode Syloti Nagri digits at this time. It is conceivable that a proposal for Syloti Nagri digits may become justified at some future pont if further manuscripts are discovered, in which case an additional column of characters would be required. We have no reason to anticipate this happening in the near future, however. (The code chart is presented on a new page.) Page 4

Code Chart: xx0 xx1 xx2 xx3 0 A O e 1 B P f 2 p Q g 3 C R h 4 D S i 5 E T k 6 ^ U o 7 F V hm 8 G W v 9 H X w A I Y x q B Z y C K a xx D L b E M c F N d Character Names: U+xx00 SYLOTI NAGRI LETTER A U+xx01 SYLOTI NAGRI LETTER I U+xx02 SILOTI NAGRI SIGN DVISVARA U+xx03 SYLOTI NAGRI LETTER U U+xx04 SYLOTI NAGRI LETTER E U+xx05 SYLOTI NAGRI LETTER O U+xx06 SYLOTI NAGRI SIGN HASANTA = halant, virama U+xx07 SYLOTI NAGRI LETTER KO U+xx08 SYLOTI NAGRI LETTER KHO U+xx09 SYLOTI NAGRI LETTER GO U+xx0A SYLOTI NAGRI LETTER GHO U+xx0B SYLOTI NAGRI SIGN ANUSVARA U+xx0C SYLOTI NAGRI LETTER CO U+xx0D SYLOTI NAGRI LETTER CHO U+xx0E SYLOTI NAGRI LETTER JO U+xx0F SYLOTI NAGRI LETTERJHO U+xx10 SYLOTI NAGRI LETTER TTO U+xx11 SYLOTI NAGRI LETTER TTHO U+xx12 SYLOTI NAGRI LETTER DDO U+xx13 SYLOTI NAGRI LETTER DDHO U+xx14 SYLOTI NAGRI LETTER TO U+xx15 SYLOTI NAGRI LETTER THO U+xx16 SYLOTI NAGRI LETTER DO U+xx17 SYLOTI NAGRI LETTER DHO U+xx18 SYLOTI NAGRI LETTER NO U+xx19 SYLOTI NAGRI LETTER PO U+xx1A SYLOTI NAGRI LETTER PHO U+xx1B SYLOTI NAGRI LETTER BO U+xx1C SYLOTI NAGRI LETTER BHO U+xx1D SYLOTI NAGRI LETTER MO U+xx1E SYLOTI NAGRI LETTER RO U+xx1F SYLOTI NAGRI LETTER LO U+xx20 SYLOTI NAGRI LETTER RRO U+xx21 SYLOTI NAGRI LETTER SO U+xx22 SYLOTI NAGRI LETTER HO U+xx23 SYLOTI NAGRI VOWEL SIGN A U+xx24 SYLOTI NAGRI VOWEL SIGN I U+xx25 SYLOTI NAGRI VOWEL SIGN U U+xx26 SYLOTI NAGRI VOWEL SIGN E U+xx27 SYLOTI NAGRI VOWEL SIGN OO U+xx28 SILOTI NAGRI SIGN FUL U+xx29 SILOTI NAGRI POETRY MARK 1 U+xx2A SILOTI NAGRI POETRY MARK 2 U+xx2B SILOTI NAGRI POETRY MARK 3 U+xx2C SILOTI NAGRI POETRY MARK 4 U+xx2D..U+xx3F (These positions shall not be used.) Page 5

Unicode character properties With the exception of the characters listed below, we propose that all characters have a General Category property of Lo, a Canonical Combining Class property of 0, a Bidirectional Class property of L, a Mirrored property of N, an East Asian Width property of N, and a Line Breaking property of AL. In the information provided below, only those values that differ from the defaults just mentioned are listed. Characters U+xx06 U+xx02, U+xx0B, U+xx26 U+xx23, U+xx24, U+xx27 U+xx25 U+xx28 U+xx29..U+xx2C U+xx2D Properties General Category = Mn, Canonical Combining Class = 9, Bidi Class = NSM, Line Breaking = CM General Category = Mn, Canonical Combining Class = 230, Bidi Class = NSM, Line Breaking = CM General Category = Mc, Canonical Combining Class = 0, Line Breaking = CM General Category = Mn, Canonical Combining Class = 220, Bidi Class = NSM, Line Breaking = CM General Category = Po, Line Breaking = QU General Category = Po, Line Breaking = AL General Category = Cf, Line Breaking = SA The canonical combining classes assigned to characters of general category Mc (other than hasanta / virama) do not match typical practice for Indic scripts (which would be 0). We propose these classes as they better deal with the behaviours of the script and are less likely to lead to problems in implementation and use. For instance, the DVISVARA and VOWEL SIGN U can co-occur, but these combining marks do not typographically interact, so alternate orderings of the characters cannot result in visible distinction and hence cannot have any distinct information value. By putting these two combining marks in distinct, non-zero combining classes, these two orderings become canonically equivalent. Additional notes One of the results of discussion of documents L2/02-387 and L2/02-388 at the UTC 93 meeting was that the proposed encoding model for dealing with conjunct formation was considered controversial. In L2/03-146r, various alternate encoding models for Syloti Nagri were discussed. Our proposal has been revised to use an encoding model like that used for Myanmar script (model D in L2/03-146r): U+xx06 SYLOTI NAGRI HASANTA is used to determine formation of a conjunct of the preceding and following characters, in which case the HASANTA has no direct visual display. If, however, the HASANTA is followed by ZWNJ, no conjunct is formed, and the HASANTA becomes visible. Page 6