Issues in Khmer Unicode 4.0

Size: px
Start display at page:

Download "Issues in Khmer Unicode 4.0"

Transcription

1 Issues in Khmer Unicode 4.0 Javier Solá Abstract Some changes have been introduced in Khmer Unicode 4.0 standard order of components which render it non compatible with Unicode 3.0 or which introduce ambiguity in the standard order for modern Khmer, permitting different orders that lead to the same graphic representation. These and other aspects of Khmer Unicode 4.0 are considered here, leading to a proposal for a new standard order of components that - still permitting type old Khmer forms - eliminates the ambiguity and the backwards compatibility problems. 1. Introduction This document covers the following points. One of the components of Khmer script is the Consonant Shifter, a character that modifies the sound of the consonant it relates to. In the Khmer Unicode 3.0 standard order of components this character was placed after the Base Consonant and after any subscript consonants (a second or third consonant within the Orthographic syllable). In Unicode 4.0 the Consonant Shifter is placed after the Base consonant, but before any subscript consonants. Any text written following Unicode 3.0 rules is no longer compatible with Unicode (in version 4.0, its present form). This type of situation is what in theory Unicode is trying to avoid. It should therefore be corrected, including both placements in the next version of Unicode, so that the next version of Unicode is compatible with Unicode 3.0 and Unicode 4.0. A constructions that do not exist in Khmer has been included in the standard order of components, it consists on the inclusion of the Robat sign after a subscript consonant, something that never happens in Khmer. Also, after all vowels and signs, a Khmer coeng consonant is included once again at the end of the standard order of components. This situation never occurs in modern Khmer, but was used in some special cases in old Khmer. Its inclusion leads to ambiguity, allowing users of modern Khmer to code words in two different ways, getting the same representation. This, of course complicates possible collation, searching and spelling. A solution that still allows the coding of old Khmer is proposed. The accompanying text of the Unicode standard includes and gives examples of the use of the Zero Width non-joiner character in a location not specified in the standard order of components. It should be included. Javier Solá Open Forum of Cambodia 1 Version /21/2004

2 2. Background In Unicode 3.0, a Khmer orthographic syllable is considered to be of the form: B {S}* {C} {V} {O} Where 1 : B is a consonant or independent vowel S is a subscript consonant or independent vowel sign C is a consonant shifter V is a dependent vowel O is any other Khmer sign In most of the cases, this form agrees with the way Cambodians spell in their language. Nevertheless, Unicode 4.0 defines the standard order of components in a Khmer orthographic syllable as expressed in BNF as: B {R C} {S {R}}* {{Z} V} {O} {S} Where: B is a base character (consonant character, independent vowel character, and so on) R is a robat C is a consonant shifter S is a subscript consonant or independent vowel sign V is a dependent vowel sign Z is the zero width non-joiner is any other sign Furthermore, the text (page 281) says that a Z (zero-width non-joiner) can also be placed before the C (consonant shifter), making the final form for Unicode 4.0: B {R {Z} C} {S {R}}* {{Z} V} {O} {S} 1 These are not the names or abbreviations that Unicode 3.0 gives to the components. In order to be able to compare with Unicode 4.0, the names and abbreviations used to represent the Unicode 3.0 standard order of components are the same that are used in Unicode 4.0 Javier Solá Open Forum of Cambodia 2 Version /21/2004

3 3. Coeng (subscript) consonants at the end of the Syllable. Khmer Unicode order is based in Khmer spelling order, which is normally different from hand-writing order. In spelling order, in modern day Khmer, vowels are always placed after coeng (subscript) consonants, as it is the last coeng consonant the one whose sound is continued by the vowel. Unicode 4.0 locates coeng consonants between the base consonant and the vowel (its traditional location, as well as its Unicode 3.0 location), but it also includes a second placement at the very end of the standard order of components, in order to be compatible with old forms of Khmer. This leads to ambiguity in modern Khmer, as words with vowels and coeng consonants can be written in two different ways. Following this rule, the word ក could be spelled in two different ways, leading to identical representation: The Unicode 3.0 way (Khmer spelling order) ka + coeng + ta + ii Placing the vowel before the coeng consonant ka + ii + coeng + ta which, of course leads to extreme difficulty for searching, collation and spelling algorithms. The ambiguity could be solved, at the same time that rare old forms are allowed, by allowing the final coeng consonant only when preceded by a ZWJ (ZERO WIDTH JOINER) character, thus allowing old forms, but making sure that no mistakes or ambiguities will exist in modern Khmer. With this change, an old form that uses a coeng consonant after a vowel should use the ZWJ character, as in: ទង to + a + nikahit + ZWJ + coeng + ngo [ both (= ទង )] The standard order of components would change from B {R C} {S {R}}* {{Z} V} {O} {S} to B {R C} {S {R}}* {{Z} V} {O} {ZJ S} where ZJ stands for ZERO WIDTH JOINER Javier Solá Open Forum of Cambodia 3 Version /21/2004

4 Issues in Khmer Unicode Robat after a Coeng (subscript) consonant. The Unicode 4.0 Book says about robat. The Khmer sign robat historically corresponds to the Devanagari repha, a representation of syllable-initial r-. However, the Khmer script can treat the initial r- in the same way as the other initial consonants namely, a consonant character ro and as many subscript consonant signs as necessary. There are old loan words from Sanskrit and Pali including robat, but in some of them, the robat is not pronounced and is preserved in a fossilized spelling. Because robat is a distinct sign from the consonant character ro, the Unicode Standard encodes U+17CC KHMER SIGN ROBAT while it treats the Devanagari repha as a part of a ligature without encoding it. The authoritative Chuon Nath dictionary sorts robat as if it were a base consonant character, just as the repha is sorted in scripts that use it. The consonant over which robat resides is then sorted as if it were a subscript. Examples of consonant clusters beginning with ro and robat: ចរ ro + aa + co + ro + coeng + sa + ii [rè'crsei] king hermit យ qa + aa + yo + robat [paqrya] civilized ( រយ, qa + aa + ro + coeng + yo) ពតមន po + ta + robat + mo + aa + no [pmqdtmè'n] news Robat is used for loan words that have been taken from Pali and Sanskrit. It is interesting to look at the complete list of words included in the authoritative Chuon Nat dictionary that use robat: កកដ ទយស ពណន វសទពណ អធម កប រ ទយយធម ពតមន វសគ : អនយ កណ ទសធម ពណន សកដមគ អន យកធម កបស ធម ពពណន សងខតធម អយត ធម គភ នយកធម ពធបកខយធម សមបណ អឃ ឆកមវចរស គ នយយនកធម ពយធធម សគ : អជ ន ជងឃមគ នវរណធម ម ទបព សពជញ អថ ជតធម បញច ពណ មគ សពងគ អសងខតធម តបធម បរប រណ មគ ស ពជញ ឃ ត យត ន បបធម យត ធម សទធ ថ ថ ទកខ ពត ប ណម មពណ សជ វធម យន ទដ ធម ប ព លអកពណ សពណ ឧនម គ ទគត ប ពទស វណ ស គ ទ គម ប ពនមត វបរ មធម ស គ ទជ ន បកខរពស វបយយ ធម ទពល ពណ វបយស ហមពណ Javier Solá Open Forum of Cambodia 4 Version /21/2004

5 We can see very quickly that none of them has coeng consonant in the same orthographic syllable as the robat (nor a superscript vowel). Also, when រ appears in Chuon Nat, it is never followed by two subscript consonants. The words in this dictionary that include រ with a subscript consonant are: ករ នស យចរយ ពទធ ចរយ វឌ ចរយ អនយតរ ថយ ករ សពទ បព ជ ចរយ មងគល ទពចរយ ទធ ចរយ អ ទ រយ ករ បចឆ ចរយ មហ ច រយ សរ ចរយ គនថចរនចរយ បដ រយ ម ហស រយ ស រយ រយ ជតកចរយ តរ ថយ ពរ ឡន ពទ រយ មករ វរ មន ស រយកន ម រ រយក សរ ពស ទរ ភក ពទ រយ វរ មត អនសចរយ This is probably due to the fact that in most words that include two coeng consonants, the second one is (only 10 exceptions to this rule in all of Khmer). If words with sound combinations similar to the one in the English word Arthritis (R + TH + R) were to be brought into Khmer (it would require the រ ថ or another similar combination) we have to assume that they will be written using modern Khmer form រ ថ and not robat, as in ថ. All this leads to the fact that there are no words in Khmer that include an orthographic syllable that combines robat and a coeng consonant, nor is there a reasonable possibility of them being created. Therefore the second robat present in the standard order of components in Unicode 4.0, after the first (or second) coeng consonant, is unnecessary and something that does not exist in Khmer and is not desirable to permit, therefore the standard order of components should change from: B {R C} {S {R}}* {{Z} V} {O} {ZJ S} to: B {R C} {S}* {{Z} V} {O} {ZJ S} Javier Solá Open Forum of Cambodia 5 Version /21/2004

6 5. Placement of the consonant shifter in the standard order of components The change on the location of the consonant shifter (CS) in the standard order of components from Unicode 3.0 to Unicode 4.0 has broken the Unicode standard for Khmer, making specifications and fonts written for Unicode 3.0 non compatible with Unicode 4.0. In Unicode 3.0 the CS was placed after the base consonants and coeng consonant, in a location that fits spelling order. Khmer speakers always consider the CS in this position, as before writing the coeng consonant, they do not know where it will be placed (physically). There are no cases in the Chuon Nat dictionary in which the CS is combined with two coeng consonants. B {S}* {C} {V} {O} - Unicode 3.0 (C stands for consonant shifter) Unicode 4.0 has moved the location in which the CS has to be typed to a location before the coeng consonant. By doing these, it has rendered non Unicode-compatible all specifications and files written before Unicode 4.0: B {R C} {S}* {{Z} V} {O} { S} - Unicode 4.0 From a compatibility point of view, the next version of Unicode should accept the CS in both positions, in order to be backwards compatible with Unicode 3.0 and 4.0, leading to: B {R C} {S}* {C} {{Z} V} {O}{ZJ S} 2 Leaving the technical standards discussion aside, and as the Khmer language is concerned, the discussion on which place the CS should occupy is a complicated one, as the CS when accompanying a base consonant and a coeng consonant - sometimes affects the base consonant, as in the cases of ស ស ង and ម ក គក and in other words it affects the coeng consonant, as in បន 3. In the case of the word បន (ប + ន in Unicode 3.0), if the CS was placed before (ប + ន ), it would affect ន, and it would not have to be a, but a (only can affect ន, and only will it shift correctly to the subscript form). In other words, the wrong character would have to be written in order to have the correct glyph (or all fonts need to be re-developed and re-distributed, which is what this standard tries to avoid). 2 There are never two coeng consonants in combination with a consonant shifter in the same orthographic syllable. 3 This is independent from the fact that the user will probably think about placing the CS after writing the base and the coeng consonant (because at that point, if it the CS changes to the subscript form, he knows where to write it). Javier Solá Open Forum of Cambodia 6 Version /21/2004

7 The solution of allowing the CS in two places seems to be the most correct one in both technical and orthographic terms. Of course, allowing the same character in two different locations leads to ambiguity in the cases in which the base consonant and the coeng consonant belong to the same series, but the number of cases is small enough and seems like a minor problem compared to having to use the wrong character is some cases. Again, maybe there is a better solution. 6. Zero width non-joiner Besides the use of the ZERO WIDTH NON-JOINER to avoid consonant-vowel ligatures, and therefore placed in the location indicated by the standard order of components, page 282 of the Unicode 282 book says that: If either muusikatoan or triisap needs to keep its superscript shape (as an exception to the general rule where other superscripts typically force the alternative subscript glyph for either character), U+200C ZERO WIDTH NON-JOINER should be inserted before the consonant shifter to show the normal glyph or a consonant shifter when the general rule requires the alternative glyph. In such cases, U+200C ZERO WIDTH NON-JOINER is inserted before the vowel sign. If we integrate this in the standard order of components, it will give us: B {R {{Z} C}} {S}* {{Z} C} {{Z} V} {O} {ZJ S} 7. Conclusion In several cases, either because the standard has been broken, because the attempt to include old Khmer forms has lead to ambiguity in the order of characters that should be accepted, or because comments in the text do not fit the sequence, it is necessary to modify the standard order of components for Khmer Unicode. The final order of components should be: B {R {{Z} C}} {S}* {{Z} C} {{Z} V} {O} {ZJ S} Where: B is a base character (consonant or independent vowel character) R is a robat C is a consonant shifter S is a subscript consonant or independent vowel sign V is a dependent vowel sign Z is the zero width non-joiner ZJ is the zero width joiner is any other sign Javier Solá Open Forum of Cambodia 7 Version /21/2004

Khmer Angkor Keyboard

Khmer Angkor Keyboard Khmer Angkor Keyboard Contents Overview... 2 Khmer Angkor Keyboard Layouts... 2 Desktop Layout Windows/macOS... 2 Touch Layout Android/iOS... 3 Khmer Character Categories and Keystrokes for Desktop...

More information

Ubuntu Desktop LTS

Ubuntu Desktop LTS 3 ឧ ( ) TRISILCO SOLUTIONS (CAMBODIA) CO., LTD ល Desktop Virtualization ឧ ណ NComputing Ubuntu Desktop 12.04 LTS ០៦ ១០ ឧ ២០១៣ ១: Ubuntu Desktop 12.04 LTS ១. Computer Operating System... ១ ២. ង Ubuntu Dekstop

More information

Khmer Collation Development

Khmer Collation Development Khmer Collation Development Chea Sok Huor, Atif Gulzar, Ros Pich Hemy and Vann Navy Csh007@gmail.com, atif.gulzar@gmail.com, pichhemy@gmail.com Abstract This document discusses the research on Khmer Standardization

More information

A. Administrative. B. Technical General L2/ DATE:

A. Administrative. B. Technical General L2/ DATE: L2/02-096 DATE: 2002-02-13 DOC TYPE: Expert contribution TITLE: Proposal to encode Khmer subscript characters CHEA Sok Huor, LAO Kim Leang, HARADA Shiro, Norbert SOURCE: KLEIN PROJECT: STATUS: Proposal

More information

បង ក ត Excel UserForm ង ម ប បញ ច លទ ន នន យង ក ន ត រ នន Excel 2007

បង ក ត Excel UserForm ង ម ប បញ ច លទ ន នន យង ក ន ត រ នន Excel 2007 បង ក ត Excel UserForm ង ម ប បញ ច លទ ន នន យង ក ន ត រ នន Excel 2007 UserForm គ ជ ម ខង រម យង ក ន ក ម មវ ធ Microsoft Excel ដ លជ Form ម យម ន លក ខណ ព ង ម រម ប ង ក អ នក យក វ ម ក ងម រប ក ន ក រក ណត រង រក ង ម ប

More information

Proposal on Handling Reph in Gurmukhi and Telugu Scripts

Proposal on Handling Reph in Gurmukhi and Telugu Scripts Proposal on Handling Reph in Gurmukhi and Telugu Scripts Nagarjuna Venna August 1, 2006 1 Introduction Chapter 9 of the Unicode standard [1] describes the representational model for encoding Indic scripts.

More information

[aksa:] by Dara Saoyuth and Christine Schmutzler Phnom Penh 2010

[aksa:] by Dara Saoyuth and Christine Schmutzler Phnom Penh 2010 [aksa:] by Dara Saoyuth and Christine Schmutzler Phnom Penh 2010 ក រធ វ ឱ យដ ងអ ព ម ទ ទវ ទ យ ន ក ន ងប ទ សកម ព ជ ក រយល ឃ ញត មរយ ក ររចន ក ហ វ ក Raising awareness of Typography in Cambodia A Graphic Design

More information

Khmer OCR for Limon R1 Size 22 Report

Khmer OCR for Limon R1 Size 22 Report PAN Localization Project Project No: Ref. No: PANL10n/KH/Report/phase2/002 Khmer OCR for Limon R1 Size 22 Report 09 July, 2009 Prepared by: Mr. ING LENG IENG Cambodia Country Component PAN Localization

More information

Proposal to encode Devanagari Sign High Spacing Dot

Proposal to encode Devanagari Sign High Spacing Dot Proposal to encode Devanagari Sign High Spacing Dot Jonathan Kew, Steve Smith SIL International April 20, 2006 1. Introduction In several language communities of Nepal, the Devanagari script has been adapted

More information

The proposer gratefully acknowledges the help of Jony Rosenne in preparing this proposal.

The proposer gratefully acknowledges the help of Jony Rosenne in preparing this proposal. Title: Source: Status: Action: On the Hebrew vowel HOLAM Peter Kirk Date: 2004-06-05 Individual Contribution For consideration by the UTC The proposer gratefully acknowledges the help of Jony Rosenne in

More information

JTC1/SC2/WG2 N

JTC1/SC2/WG2 N Universal Multiple-Octet Coded Character Set International Organization for Standardization Organisation Internationale de Normalisation Международная организация по стандартизации Doc Type: Working Group

More information

Microsoft Word ១ របប បច ល Windows Explorer ២. របប បស អ តផ ផ ថ ស ( Scan Disk and Defragmenter)... 5

Microsoft Word ១ របប បច ល Windows Explorer ២. របប បស អ តផ ផ ថ ស ( Scan Disk and Defragmenter)... 5 Table of Contents Microsoft Word... 7 ១ របប បច ល Windows Explorer... 5 ២. របប បស អ តផ ផ ថ ស ( Scan Disk and Defragmenter)... 5 ៣. របប បព ន តយបម លថ ស ( Scan Disk )... 6 ៤. របប បស អ តបមប គ ( Scan Virus )...

More information

1. Internet. Internet គ រត ប រង ទ រប នជ វ ព កមរយ: Protocol TCP/IP. Internet គ យ ច ប ស ង. Internet (ISP) Internet

1. Internet. Internet គ រត ប រង ទ រប នជ វ ព កមរយ: Protocol TCP/IP. Internet គ យ ច ប ស ង. Internet (ISP) Internet ម យៗន ហមក ក ង វ ន ព យស ន មព នកមហ ន ម កចធ ស ធ នក ព ង ផ នឬឯ ពយស ង ក ទខ ន ច ន យ ព ន ដល កមហ ពអច ម យ ន ន ន ស កជ ន ន ច ស ល ទ ល ន នក ង ច យ ន ន ធ ផ ធ ធ ធ ល ព ក កមហ មច ច ពចល ន ហ ស ច ជ កន ន កចម ស នស ង ជ សន ធ ស ស

More information

Proposal to encode the SANDHI MARK for Newa

Proposal to encode the SANDHI MARK for Newa Proposal to encode the SANDHI MARK for Newa Srinidhi A and Sridatta A Tumakuru, India srinidhi.pinkpetals24@gmail.com, sridatta.jamadagni@gmail.com December 23, 2016 1 Introduction This is a proposal to

More information

JTC1/SC2/WG2 N4945R

JTC1/SC2/WG2 N4945R Universal Multiple-Octet Coded Character Set International Organization for Standardization Organisation Internationale de Normalisation Международная организация по стандартизации Doc Type: Working Group

More information

Bengali Script: Formation of the Reph and Yaphala, and use of the ZERO WIDTH JOINER and ZERO WIDTH NON-JOINER

Bengali Script: Formation of the Reph and Yaphala, and use of the ZERO WIDTH JOINER and ZERO WIDTH NON-JOINER Bengali Script: Formation of the Reph and Yaphala, and use of the ZERO WIDTH JOINER and ZERO WIDTH NON-JOINER Written by: Paul Nelson, Microsoft Corporation Last Updated: 30 June 2003 Overview: In the

More information

1. Introduction 2. TAMIL LETTER SHA Character proposed in this document About INFITT and INFITT WG

1. Introduction 2. TAMIL LETTER SHA Character proposed in this document About INFITT and INFITT WG Dated: September 14, 2003 Title: Proposal to add TAMIL LETTER SHA Source: International Forum for Information Technology in Tamil (INFITT) Action: For consideration by UTC and ISO/IEC JTC 1/SC 2/WG 2 Distribution:

More information

Proposals For Devanagari, Gurmukhi, And Gujarati Scripts Root Zone Label Generation Rules

Proposals For Devanagari, Gurmukhi, And Gujarati Scripts Root Zone Label Generation Rules Proposals For Devanagari, Gurmukhi, And Gujarati Scripts Root Zone Label Generation Rules Publication Date: 20 October 2018 Prepared By: IDN Program, ICANN Org Public Comment Proceeding Open Date: 27 July

More information

Andrew Glass and Shriramana Sharma. anglass-at-microsoft-dot-com jamadagni-at-gmail-dot-com November-2

Andrew Glass and Shriramana Sharma. anglass-at-microsoft-dot-com jamadagni-at-gmail-dot-com November-2 Proposal to encode 1107F BRAHMI NUMBER JOINER (REVISED) Andrew Glass and Shriramana Sharma anglass-at-microsoft-dot-com jamadagni-at-gmail-dot-com 1. Background 2011-vember-2 In their Brahmi proposal L2/07-342

More information

RDBMS (Relational Database Management System) ចជ ស ថល គរ ន, ក ម ហ ន វ ជ យ ល គល កអ នកប នកន ងក រ រក ទ កពត ម នទ ងអ ស របស

RDBMS (Relational Database Management System) ចជ ស ថល គរ ន, ក ម ហ ន វ ជ យ ល គល កអ នកប នកន ងក រ រក ទ កពត ម នទ ងអ ស របស ម ម នទ ១ ក រស គ ល ព Ms. Access 2010 1. Microsoft Access 2010 Ms.Access: គ ជ កញ ច ប កម មវ ធ ដ លគគគ ប ស រ ប ក រគរ បច រក ទ ក ន ង បម លផ ទ នន យ គប ទ នន យ, ដស ងរកទ នន យ, ន ងស សង ទ នន យជ គ ម Microsoft Access

More information

The Unicode Standard Version 6.1 Core Specification

The Unicode Standard Version 6.1 Core Specification The Unicode Standard Version 6.1 Core Specification To learn about the latest version of the Unicode Standard, see http://www.unicode.org/versions/latest/. Many of the designations used by manufacturers

More information

HTML & xhtml បង ក តង ហទ ព រជ ម យ ស ក ស របប បបប ក ត បវបស យ ដ លទ ន សម យ. ត ត វត មសត ដ HTML ន xhtml ងរ បងរ ង យ ក យ ទ តយត រ

HTML & xhtml បង ក តង ហទ ព រជ ម យ ស ក ស របប បបប ក ត បវបស យ ដ លទ ន សម យ. ត ត វត មសត ដ HTML ន xhtml ងរ បងរ ង យ ក យ ទ តយត រ បង ក តង ហទ ព រជ ម យ HTML & xhtml ស ក ស របប បបប ក ត បវបស យ ដ លទ ន សម យ ត ត វត មសត ដ HTML ន xhtml ងរ បងរ ង យ ក យ ទ តយត រ ស រ ប បង នង មជ ឈមណ ឌ ល ITEC ន ច កច យត ម Bayon Hosting 1 Module 1: HTML Basic Bayon

More information

Proposal to encode the DOGRA VOWEL SIGN VOCALIC RR

Proposal to encode the DOGRA VOWEL SIGN VOCALIC RR Proposal to encode the DOGRA VOWEL SIGN VOCALIC RR Srinidhi A and Sridatta A Tumakuru, India srinidhi.pinkpetals24@gmail.com, sridatta.jamadagni@gmail.com June 25, 2017 1 Introduction This is a proposal

More information

ISO/IEC JTC 1/SC 2/WG 2 Proposal summary form N2652-F accompanies this document.

ISO/IEC JTC 1/SC 2/WG 2 Proposal summary form N2652-F accompanies this document. Dated: April 28, 2006 Title: Proposal to add TAMIL OM Source: International Forum for Information Technology in Tamil (INFITT) Action: For consideration by UTC and ISO/IEC JTC 1/SC 2/WG 2 Distribution:

More information

Two distinct code points: DECIMAL SEPARATOR and FULL STOP

Two distinct code points: DECIMAL SEPARATOR and FULL STOP Two distinct code points: DECIMAL SEPARATOR and FULL STOP Dario Schiavon, 207-09-08 Introduction Unicode, being an extension of ASCII, inherited a great historical mistake, namely the use of the same code

More information

097B Ä DEVANAGARI LETTER GGA 097C Å DEVANAGARI LETTER JJA 097E Ç DEVANAGARI LETTER DDDA 097F É DEVANAGARI LETTER BBA

097B Ä DEVANAGARI LETTER GGA 097C Å DEVANAGARI LETTER JJA 097E Ç DEVANAGARI LETTER DDDA 097F É DEVANAGARI LETTER BBA ISO/IEC JTC1/SC2/WG2 N2934 L2/05-082 2005-03-30 Universal Multiple-Octet Coded Character Set International Organization for Standardization Organisation Internationale de Normalisation еждународная организация

More information

- ខ!"រ - អងស. IT (Information Technology) Dictionary English - Khmer - English

- ខ!រ - អងស. IT (Information Technology) Dictionary English - Khmer - English ខ រអយធ សរបអក Khmer IT for You www.khmeritforyou.blogspot.com អ-ស វស e-book វចននក ម អយធ (បសច វទ ពតមន) អងស - ខ!"រ - អងស IT (Iformatio Techology) Dictioary Eglish - Khmer - Eglish $រល % ដល (នខ!"រក បរ(ប សកប)ក*

More information

ISO/IEC JTC 1/SC 2/WG 2 PROPOSAL SUMMARY FORM TO ACCOMPANY SUBMISSIONS FOR ADDITIONS TO THE REPERTOIRE OF ISO/IEC

ISO/IEC JTC 1/SC 2/WG 2 PROPOSAL SUMMARY FORM TO ACCOMPANY SUBMISSIONS FOR ADDITIONS TO THE REPERTOIRE OF ISO/IEC ISO/IEC JTC 1/SC 2/WG 2 PROPOSAL SUMMARY FORM TO ACCOMPANY SUBMISSIONS FOR ADDITIONS TO THE REPERTOIRE OF ISO/IEC 10646 1 A. Administrative 1. Title: Encoding of Devanagari Rupee Sign in Devanagari code

More information

ZWJ/ZWNJ. behavior under Indic scripts with special reference to chillu, conjuncts, etc in Malayalam. Rajeev J Sebastian Rachana Akshara Vedi

ZWJ/ZWNJ. behavior under Indic scripts with special reference to chillu, conjuncts, etc in Malayalam. Rajeev J Sebastian Rachana Akshara Vedi ZWJ/ZWNJ behavior under Indic scripts with special reference to chillu, conjuncts, etc in Malayalam Rajeev J Sebastian Rachana Akshara Vedi 1 Definitions ZWJ and ZWNJ are format control characters with

More information

****This proposal has not been submitted**** ***This document is displayed for initial feedback only*** ***This proposal is currently incomplete***

****This proposal has not been submitted**** ***This document is displayed for initial feedback only*** ***This proposal is currently incomplete*** 1 of 5 3/3/2003 1:25 PM ****This proposal has not been submitted**** ***This document is displayed for initial feedback only*** ***This proposal is currently incomplete*** ISO INTERNATIONAL ORGANIZATION

More information

Proposal to encode MALAYALAM SIGN PARA

Proposal to encode MALAYALAM SIGN PARA Proposal to encode MALAYALAM SIGN PARA Introduction Cibu Johny, cibu@google.com 2014-Jan-16 Historically Paṟa has been an important measurement unit in Kerala, for measuring rice grain. The word also described

More information

FLT: Font Layout Table

FLT: Font Layout Table FLT: Font Layout Table Kenichi Handa, Mikiko Nishikimi, Naoto Takahashi and Satoru Tomura Abstract Rendering a complex text such as one written in Indic scripts, or Complex Text Layout requires many kinds

More information

Request for encoding GRANTHA LENGTH MARK

Request for encoding GRANTHA LENGTH MARK Request for encoding 11355 GRANTHA LENGTH MARK Shriramana Sharma jamadagni-at-gmail-dot-com 2009-Oct-25 This is a request for encoding a character in the Grantha block. While I have only recently submitted

More information

Title: Application to include Arabic alphabet shapes to Arabic 0600 Unicode character set

Title: Application to include Arabic alphabet shapes to Arabic 0600 Unicode character set Title: Application to include Arabic alphabet shapes to Arabic 0600 Unicode character set Action: For consideration by UTC and ISO/IEC JTC1/SC2/WG2 Author: Mohammad Mohammad Khair Date: 17-Dec-2018 Introduction:

More information

Proposal to Encode the Ganda Currency Mark for Bengali in the BMP of the UCS

Proposal to Encode the Ganda Currency Mark for Bengali in the BMP of the UCS Proposal to Encode the Ganda Currency Mark for Bengali in the BMP of the UCS University of Michigan Ann Arbor, Michigan, U.S.A. pandey@umich.edu May 21, 2007 1 Introduction This is a proposal to encode

More information

Feedback on Draft Devanagari Script Behaviour for Hindi Ver 1.4.9

Feedback on Draft Devanagari Script Behaviour for Hindi Ver 1.4.9 Feedback on Draft Devanagari Script Behaviour for Hindi Ver 1.4.9 S. Page Remarks Concern Status No. Version No. 1 1.4.9 Test Report of Akshara, is missing. Pending Pl. check Annexure 5: Definition of

More information

Request for encoding 1CF3 ROTATED ARDHAVISARGA

Request for encoding 1CF3 ROTATED ARDHAVISARGA Request for encoding 1CF3 ROTATED ARDHAVISARGA Shriramana Sharma jamadagni-at-gmail-dot-com 2009-Oct-09 This is a request for encoding a character in the Vedic Extensions block. This character resembles

More information

Multilingual mathematical e-document processing

Multilingual mathematical e-document processing Multilingual mathematical e-document processing Azzeddine LAZREK University Cadi Ayyad, Faculty of Sciences Department of Computer Science Marrakech - Morocco lazrek@ucam.ac.ma http://www.ucam.ac.ma/fssm/rydarab

More information

Proposed Update Unicode Standard Annex #34

Proposed Update Unicode Standard Annex #34 Technical Reports Proposed Update Unicode Standard Annex #34 Version Unicode 6.3.0 (draft 1) Editors Addison Phillips Date 2013-03-29 This Version Previous Version Latest Version Latest Proposed Update

More information

transcribing Urdu or Arabic words. Accordingly, the KHHA and GHHA should be considered atomic, as Tibetan TSA, TSHA, and DZA are.

transcribing Urdu or Arabic words. Accordingly, the KHHA and GHHA should be considered atomic, as Tibetan TSA, TSHA, and DZA are. ISO/IEC JTC1/SC2/WG2 N2985 L2/05-244 2005-09-05 Universal Multiple-Octet Coded Character Set International Organization for Standardization Organisation Internationale de Normalisation Международная организация

More information

5c. Are the character shapes attached in a legible form suitable for review?

5c. Are the character shapes attached in a legible form suitable for review? ISO/IEC JTC1/SC2/WG2 N2790 L2/04-232 2004-06-10 Universal Multiple-Octet Coded Character Set International Organization for Standardization Organisation Internationale de Normalisation еждународная организация

More information

Form number: N2352-F (Original ; Revised , , , , , , ) N2352-F Page 1 of 7

Form number: N2352-F (Original ; Revised , , , , , , ) N2352-F Page 1 of 7 ISO/IEC JTC 1/SC 2/WG 2 PROPOSAL SUMMARY FORM TO ACCOMPANY SUBMISSIONS FOR ADDITIONS TO THE REPERTOIRE OF ISO/IEC 10646 1 Please fill all the sections A, B and C below. (Please read Principles and Procedures

More information

Joiners (ZWJ/ZWNJ) with Semantic content for words in Indian subcontinent languages

Joiners (ZWJ/ZWNJ) with Semantic content for words in Indian subcontinent languages Joiners (ZWJ/ZWNJ) with Semantic content for words in Indian subcontinent languages N. Ganesan This document gives examples of Unicode joiners, ZWJ and ZWNJ where the meanings of words differ substantially

More information

ISO/IEC JTC1/SC2/WG2 N4157 L2/12-121

ISO/IEC JTC1/SC2/WG2 N4157 L2/12-121 ISO/IEC JTC1/SC2/WG2 N4157 L2/12-121 2012-04-23 Title: Proposal to Encode the Sign ANJI for Bengali Source: (pandey@umich.edu) Status: Individual Contribution Action: For consideration by UTC and WG2 Date:

More information

Rendering in Dzongkha

Rendering in Dzongkha Rendering in Dzongkha Pema Geyleg Department of Information Technology pema.geyleg@gmail.com Abstract The basic layout engine for Dzongkha script was created with the help of Mr. Karunakar. Here the layout

More information

ISO/IEC JTC 1/SC 2/WG 2 PROPOSAL SUMMARY FORM TO ACCOMPANY SUBMISSIONS FOR ADDITIONS TO THE REPERTOIRE OF ISO/IEC / UNICODE

ISO/IEC JTC 1/SC 2/WG 2 PROPOSAL SUMMARY FORM TO ACCOMPANY SUBMISSIONS FOR ADDITIONS TO THE REPERTOIRE OF ISO/IEC / UNICODE ISO/IEC JTC 1/SC 2/WG 2 PROPOSAL SUMMARY FORM TO ACCOMPANY SUBMISSIONS FOR ADDITIONS TO THE REPERTOIRE OF ISO/IEC 10646 / UNICODE SC2/WG2 Nxxxx L2/02-xxx 2002-11-01 A. Administrative 1. Title: Proposal

More information

Issues in Indic Language Collation

Issues in Indic Language Collation Issues in Indic Language Collation Cathy Wissink Program Manager, Windows Globalization Microsoft Corporation I. Introduction As the software market for India 1 grows, so does the interest in developing

More information

L2/ Proposal to encode archaic vowel signs O OO for Kannada. 1. Thanks. 2. Introduction

L2/ Proposal to encode archaic vowel signs O OO for Kannada. 1. Thanks. 2. Introduction L2/14-004 Proposal to encode archaic vowel signs O OO for Kannada Shriramana Sharma, jamadagni-at-gmail-dot-com, India 2013-Dec-31 1. Thanks I thank Srinidhi of Tumkur, Karnataka, for alerting me to these

More information

Proposal to encode the Prishthamatra for Nandinagari

Proposal to encode the Prishthamatra for Nandinagari Proposal to encode the Prishthamatra for Nandinagari Srinidhi A srinidhi.pinkpetals24@gmail.com Sridatta A sridatta.jamadagni@gmail.com October 13, 2017 1 Introduction This document requests to add a new

More information

DATE: Time: 12:28 AM N SupportN2621 Page: 1 of 9 ISO/IEC JTC 1/SC 2/WG 2

DATE: Time: 12:28 AM N SupportN2621 Page: 1 of 9 ISO/IEC JTC 1/SC 2/WG 2 DATE: 2003-10-17 Time: 12:28 AM N2661 - SupportN2621 Page: 1 of 9 ISO/IEC JTC 1/SC 2/WG 2 N2661 ISO/IEC JTC 1/SC 2/WG 2 Date: 2003-10-17 Universal Multiple-Octet Coded Character Set (UCS) - ISO/IEC 10646

More information

Template for comments and secretariat observations Date: Document: ISO/IEC 10646:2014 PDAM2

Template for comments and secretariat observations Date: Document: ISO/IEC 10646:2014 PDAM2 Template for s and secretariat observations Date: 014-08-04 Document: ISO/IEC 10646:014 PDAM 1 (3) 4 5 (6) (7) on each submitted GB1 4.3 ed Subclause title incorrectly refers to CJK ideographs. Change

More information

1. Introduction 2. TAMIL DIGIT ZERO JTC1/SC2/WG2 N Character proposed in this document About INFITT and INFITT WG

1. Introduction 2. TAMIL DIGIT ZERO JTC1/SC2/WG2 N Character proposed in this document About INFITT and INFITT WG JTC1/SC2/WG2 N2741 Dated: February 1, 2004 Title: Proposal to add Tamil Digit Zero (DRAFT) Source: International Forum for Information Technology in Tamil (INFITT) Action: For consideration by UTC and

More information

Structure Vowel signs are used in a manner similar to that employed by other Brahmi-derived scripts. Consonants have an inherent /a/ vowel sound.

Structure Vowel signs are used in a manner similar to that employed by other Brahmi-derived scripts. Consonants have an inherent /a/ vowel sound. ISO/IEC JTC1/SC2/WG2 N3023 L2/06-003 2006-01-11 Universal Multiple-Octet Coded Character Set International Organization for Standardization Organisation Internationale de Normalisation Международная организация

More information

Standardizing the order of Arabic combining marks

Standardizing the order of Arabic combining marks UTC Document Register L2/14-127 Standardizing the order of Arabic combining marks Roozbeh Pournader, Google Inc. May 2, 2014 Summary The combining class of the combining characters used in the Arabic script

More information

0CE2 Ä KANNADA VOWEL SIGN VOCALIC L 0CE3 Å KANNADA VOWEL SIGN VOCALIC LL 0CE4 Ç KANNADA DANDA 0CE5 É KANNADA DOUBLE DANDA

0CE2 Ä KANNADA VOWEL SIGN VOCALIC L 0CE3 Å KANNADA VOWEL SIGN VOCALIC LL 0CE4 Ç KANNADA DANDA 0CE5 É KANNADA DOUBLE DANDA ISO/IEC JTC1/SC2/WG2 N2860 L2/04-364 2004-10-22 Universal Multiple-Octet Coded Character Set International Organization for Standardization Organisation Internationale de Normalisation еждународная организация

More information

PROPOSALS FOR MALAYALAM AND TAMIL SCRIPTS ROOT ZONE LABEL GENERATION RULES

PROPOSALS FOR MALAYALAM AND TAMIL SCRIPTS ROOT ZONE LABEL GENERATION RULES PROPOSALS FOR MALAYALAM AND TAMIL SCRIPTS ROOT ZONE LABEL GENERATION RULES Publication Date: 23 November 2018 Prepared By: IDN Program, ICANN Org Public Comment Proceeding Open Date: 25 September 2018

More information

Blending Content for South Asian Language Pedagogy Part 2: South Asian Languages on the Internet

Blending Content for South Asian Language Pedagogy Part 2: South Asian Languages on the Internet Blending Content for South Asian Language Pedagogy Part 2: South Asian Languages on the Internet A. Sean Pue South Asia Language Resource Center Pre-SASLI Workshop 6/7/09 1 Objectives To understand how

More information

The right hehs for Arabic script orthographies of Sorani Kurdish and Uighur

The right hehs for Arabic script orthographies of Sorani Kurdish and Uighur The right hehs for Arabic script orthographies of Sorani Kurdish and Uighur Roozbeh Pournader, Google Inc. May 8, 2014 Summary The Arabic letter heh has some variants in the Unicode Standard, which has

More information

Standardization and Implementations of Thai Language

Standardization and Implementations of Thai Language Standardization and Implementations of Thai Language Theppitak Karoonboonyanan National Electronics and Computer Technology Center, THAILAND. Overview Thai Language Thai Character Set WTT 2.0 Input Method

More information

Request for encoding 1CF4 VEDIC TONE CANDRA ABOVE

Request for encoding 1CF4 VEDIC TONE CANDRA ABOVE JTC1/SC2/WG2 N3844 Request for encoding 1CF4 VEDIC TONE CANDRA ABOVE Shriramana Sharma jamadagni-at-gmail-dot-com 2009-Oct-11 This is a request for encoding a character in the Vedic Extensions block. This

More information

Use of ZWJ/ZWNJ with Mongolian Variant Selectors and Vowel Separator SOURCE: Paul Nelson and Asmus Freytag STATUS: Proposal

Use of ZWJ/ZWNJ with Mongolian Variant Selectors and Vowel Separator SOURCE: Paul Nelson and Asmus Freytag STATUS: Proposal L2/03-065 DATE: 2003-02-13 DOC TYPE: TITLE: Expert contribution Use of ZWJ/ZWNJ with Mongolian Variant Selectors and Vowel Separator SOURCE: Paul Nelson and Asmus Freytag STATUS: Proposal Summary Display

More information

ISO/IEC JTC 1/SC 2/WG 2 PROPOSAL SUMMARY FORM TO ACCOMPANY SUBMISSIONS FOR ADDITIONS TO THE REPERTOIRE OF ISO/IEC

ISO/IEC JTC 1/SC 2/WG 2 PROPOSAL SUMMARY FORM TO ACCOMPANY SUBMISSIONS FOR ADDITIONS TO THE REPERTOIRE OF ISO/IEC ISO/IEC JTC 1/SC 2/WG 2 PROPOSAL SUMMARY FORM TO ACCOMPANY SUBMISSIONS FOR ADDITIONS TO THE REPERTOIRE OF ISO/IEC 10646 1 Please fill all the sections A, B and C below. Please read Principles and Procedures

More information

Cibu Johny, 2014-Dec-26

Cibu Johny, 2014-Dec-26 Proposal to encode MALAYALAM LETTER CHILLU Y Cibu Johny, cibu@google.com 2014-Dec-26 Discussion In the Malayalam script, a Chillu or Chillaksharam is a special vowel-less form of a consonant. In Unicode,

More information

The Unicode Standard Version 6.1 Core Specification

The Unicode Standard Version 6.1 Core Specification The Unicode Standard Version 6.1 Core Specification To learn about the latest version of the Unicode Standard, see http://www.unicode.org/versions/latest/. Many of the designations used by manufacturers

More information

ISO International Organization for Standardization Organisation Internationale de Normalisation

ISO International Organization for Standardization Organisation Internationale de Normalisation ISO International Organization for Standardization Organisation Internationale de Normalisation ISO/IEC JTC 1/SC 2/WG 2 Universal Multiple-Octet Coded Character Set (UCS) ISO/IEC JTC 1/SC 2/WG 2 N2381R

More information

International Journal of Advanced Research in Computer Science and Software Engineering

International Journal of Advanced Research in Computer Science and Software Engineering ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: Designing Methodologies of Tamil Language in Web Services C.

More information

ꟸ A7F8 LATIN SUBSCRIPT SMALL LETTER S ꟹ A7F9 LATIN SUBSCRIPT SMALL LETTER T ꟺ A7FA LATIN LETTER SMALL CAPITAL TURNED M

ꟸ A7F8 LATIN SUBSCRIPT SMALL LETTER S ꟹ A7F9 LATIN SUBSCRIPT SMALL LETTER T ꟺ A7FA LATIN LETTER SMALL CAPITAL TURNED M ISO/IEC JTC1/SC2/WG2 N3571 L2/09-028 2009-01-27 Universal Multiple-Octet Coded Character Set International Organization for Standardization Organisation Internationale de Normalisation Международная организация

More information

Proposal to Encode Oriya Fraction Signs in ISO/IEC 10646

Proposal to Encode Oriya Fraction Signs in ISO/IEC 10646 Proposal to Encode Oriya Fraction Signs in ISO/IEC 0646 University of Michigan Ann Arbor, Michigan, U.S.A. pandey@umich.edu December 4, 2007 Contents Proposal Summary Form i Introduction 2 Characters Proposed

More information

Introduction. Acknowledgements

Introduction. Acknowledgements Universal Multiple-Octet Coded Character Set International Organization for Standardization Organisation Internationale de rmalisation Международная организация по стандартизации Doc Type: Working Group

More information

Transliteration of Tamil and Other Indic Scripts. Ram Viswanadha Unicode Software Engineer IBM Globalization Center of Competency, California, USA

Transliteration of Tamil and Other Indic Scripts. Ram Viswanadha Unicode Software Engineer IBM Globalization Center of Competency, California, USA Transliteration of Tamil and Other Indic Scripts Ram Viswanadha Unicode Software Engineer IBM Globalization Center of Competency, California, USA Main points of Powerpoint presentation This talk gives

More information

Code Charts 17. Chapter Character Names List. Disclaimer

Code Charts 17. Chapter Character Names List. Disclaimer This PDF file is an excerpt from The Unicode Standard, Version 5.2, issued and published by the Unicode Consortium. The PDF files have not been modified to reflect the corrections found on the Updates

More information

Üù àõ [tai 2 l 6] (in older orthography Üù àõ»). Tai Le orthography is simple and straightforward:

Üù àõ [tai 2 l 6] (in older orthography Üù àõ»). Tai Le orthography is simple and straightforward: ISO/IEC JTC1/SC2/WG2 N2372 2001-10-05 Universal Multiple-Octet Coded Character Set International Organization for Standardization Organisation internationale de normalisation еждународная организация по

More information

Javascript & DOM Scripting

Javascript & DOM Scripting 1 Content BAYON HOSTING ច ប ផ ត ម Javascript & DOM Scripting អ នកន ងស ក ព ម ម នត ប ង វ ធ ស ស រសតថ ម ៗ ន ងវ ធ ស ស រសតម រ ជ ញ កញ ប កន ងក បមងក តមវបស យ ផ បផ ងផ យ ក យ ទ ត យត រ ស រ ប បផ ង នផ មជ ឈមណ ឌ ល ITEC

More information

Proposal to encode three Arabic characters for Arwi

Proposal to encode three Arabic characters for Arwi Proposal to encode three Arabic characters for Arwi Roozbeh Pournader, Google (roozbeh@google.com) June 24, 2013 Requested action I would like to ask the UTC and the WG2 to encode the following three Arabic

More information

REMARKS ON THE USE OF ZWJ & ZWNJ IN THE BRAHMI AND PERSO- ARABIC FAMILIES

REMARKS ON THE USE OF ZWJ & ZWNJ IN THE BRAHMI AND PERSO- ARABIC FAMILIES REMARKS ON THE USE OF ZWJ & ZWNJ IN THE BRAHMI AND PERSO- ARABIC FAMILIES The use of ZWJ/ZWNJ is at two levels: 1. EMBEDDED WITHIN THE OPEN TYPE FONT In the creation of Open Type Fonts where the ZWJ /

More information

CAMBODIA INTERNATIONAL COOPERATION INSTITUTE

CAMBODIA INTERNATIONAL COOPERATION INSTITUTE ១. PowerPoint Microsoft office PowerPoint ២. ៣ PowerPoint 2010 Double-Click Icon Microsoft PowerPoint 2010 Start All Programs Microsoft Office Microsoft PowerPoint 2010 Start Microsoft PowerPoint 2010

More information

Using non-latin alphabets in Blaise

Using non-latin alphabets in Blaise Using non-latin alphabets in Blaise Rob Groeneveld, Statistics Netherlands 1. Basic techniques with fonts In the Data Entry Program in Blaise, it is possible to use different fonts. Here, we show an example

More information

As per given sort order at Pg 58, kindly mention position of standalone क in tabular format. BY Others

As per given sort order at Pg 58, kindly mention position of standalone क in tabular format. BY Others Page S. No. Version No. Concern Status 1 1.4.8 The draft should be vetted by two other independent wellknown Pending linguists, other than author(s) and their details to be shared. By TDIL Programme 2

More information

INTERNATIONALIZATION IN GVIM

INTERNATIONALIZATION IN GVIM INTERNATIONALIZATION IN GVIM A PROJECT REPORT Submitted by Ms. Nisha Keshav Chaudhari Ms. Monali Eknath Chim In partial fulfillment for the award of the degree Of B. Tech Computer Engineering UNDER THE

More information

Yes. Form number: N2652-F (Original ; Revised , , , , , , , )

Yes. Form number: N2652-F (Original ; Revised , , , , , , , ) ISO/IEC JTC 1/SC 2/WG 2 PROPOSAL SUMMARY FORM TO ACCOMPANY SUBMISSIONS FOR ADDITIONS TO THE REPERTOIRE OF ISO/IEC 10646 1 Please fill all the sections A, B and C below. Please read Principles and Procedures

More information

B. Technical General 1. Choose one of the following: 1a. This proposal is for a new script (set of characters) Yes.

B. Technical General 1. Choose one of the following: 1a. This proposal is for a new script (set of characters) Yes. ISO/IEC JTC1/SC2/WG2 N3024 L2/06-004 2006-01-11 Universal Multiple-Octet Coded Character Set International Organization for Standardization Organisation Internationale de Normalisation Международная организация

More information

Introduction. Requests. Background. New Arabic block. The missing characters

Introduction. Requests. Background. New Arabic block. The missing characters 2009-11-05 Title: Action: Author: Proposal to encode four combining Arabic characters for Koranic use For consideration by UTC and ISO/IEC JTC1/SC2/WG2 Roozbeh Pournader Date: 2009-11-05 Introduction Although

More information

L2/09-358R Introduction. Recommendation. Background. Rub El Hizb Symbol. For discussion at UTC and by experts. No action is requested.

L2/09-358R Introduction. Recommendation. Background. Rub El Hizb Symbol. For discussion at UTC and by experts. No action is requested. L2/09-358R 2009-10-28 Title: Action: Authors: Discussion document for polishing Koranic support in Unicode For discussion at UTC and by experts. No action is requested. Roozbeh Pournader Date: 2009-10-28

More information

Proposal to Encode Some Outstanding Early Cyrillic Characters in Unicode

Proposal to Encode Some Outstanding Early Cyrillic Characters in Unicode POMAR PROJECT Proposal to Encode Some Outstanding Early Cyrillic Characters in Unicode Yuri Shardt, Nikita Simmons, Aleksandr Andreev 1 In old, Slavic documents that come from Eastern Europe in the centuries

More information

L2/ General Background

L2/ General Background Title: Background material for the proposal on the Hebrew vowel HOLAM Source: Peter Kirk Status: Individual Contribution Action: As background for the UTC when considering the HOLAM proposal Date: 2004-07-29

More information

Survey of Language Computing in Asia 2005

Survey of Language Computing in Asia 2005 Survey of Language Computing in Asia 2005 Sarmad Hussain Nadir Durrani Sana Gul Center for Research in Urdu Language Processing National University of Computer and Emerging Sciences www.nu.edu.pk www.idrc.ca

More information

Proposed Draft: Unicode Technical Report #53 UNICODE ARABIC MARK ORDERING ALGORITHM

Proposed Draft: Unicode Technical Report #53 UNICODE ARABIC MARK ORDERING ALGORITHM UNICODE ARABIC MARK ORDERING ALGORITHM Authors Roozbeh Pournader ( roozbeh@unicode.org ), Bob Hallissy ( bob_hallissy@sil.org ), Lorna Evans ( lorna_evans@sil.org ) Date 2017-10-06 This version Previous

More information

ISO/IEC JTC1/SC2/WG2 N2641

ISO/IEC JTC1/SC2/WG2 N2641 ISO/IEC JTC1/SC2/WG2 N2641 2003-10-05 Universal Multiple-Octet Coded Character Set International Organization for Standardization Organisation internationale de normalisation еждународная организация по

More information

ISO/IEC JTC 1/SC 2/WG 2 PROPOSAL SUMMARY FORM TO ACCOMPANY SUBMISSIONS. Yes (or) More information will be provided later:

ISO/IEC JTC 1/SC 2/WG 2 PROPOSAL SUMMARY FORM TO ACCOMPANY SUBMISSIONS. Yes (or) More information will be provided later: TP PT Form PT ISO/IEC JTC 1/SC 2/WG 2 PROPOSAL SUMMARY FORM TO ACCOMPANY SUBMISSIONS 1 FOR ADDITIONS TO THE REPERTOIRE OF ISO/IEC 10646TP Please fill all the sections A, B and C below. Please read Principles

More information

Issues in Indic Language Collation

Issues in Indic Language Collation Issues in Indic Language Collation Cathy Wissink Program Manager, Windows Globalization Microsoft Corporation I. Introduction As the software market for India i grows, so does the interest in developing

More information

Instructions for Using PDF Tests and Journals

Instructions for Using PDF Tests and Journals Instructions for Using PDF Tests and Journals To use the test and journal PDFs onscreen, open them in Adobe Acrobat Reader DC, a free version of the Adobe app you can download here: https://acrobat.adobe.com/us/en/acrobat/pdf-reader.html.

More information

The Unicode Standard Version 7.0 Core Specification

The Unicode Standard Version 7.0 Core Specification The Unicode Standard Version 7.0 Core Specification To learn about the latest version of the Unicode Standard, see http://www.unicode.org/versions/latest/. Many of the designations used by manufacturers

More information

2. Requester's name: Urdu and Regional Language Software Development Forum, Ministry of Science and Technology, Government of Pakistan

2. Requester's name: Urdu and Regional Language Software Development Forum, Ministry of Science and Technology, Government of Pakistan N2413-4 (L2-02/163) ISO/IEC JTC 1/SC 2/WG 2 PROPOSAL SUMMARY FORM TO ACCOMPANY SUBMISSIONS FOR ADDITIONS TO THE REPERTOIRE OF ISO/IEC 10646 1 Please fill all the sections A, B and C below. (Please read

More information

L322 Syntax. Chapter 3: Structural Relations. Linguistics 322 D E F G H. Another representation is in the form of labelled brackets:

L322 Syntax. Chapter 3: Structural Relations. Linguistics 322 D E F G H. Another representation is in the form of labelled brackets: L322 Syntax Chapter 3: Structural Relations Linguistics 322 1 The Parts of a Tree A tree structure is one of an indefinite number of ways to represent a sentence or a part of it. Consider the following

More information

ISO/IEC JTC 1/SC 2/WG 2/N2789 L2/04-224

ISO/IEC JTC 1/SC 2/WG 2/N2789 L2/04-224 ISO/IEC JTC 1/SC 2/WG 2/N2789 L2/04-224 ISO/IEC JTC 1/SC 2/WG 2 PROPOSAL SUMMARY FORM TO ACCOMPANY SUBMISSIONS FOR ADDITIONS TO THE REPERTOIRE OF ISO/IEC 10646 1 Please fill all the sections A, B and C

More information

Proposal to Add Four SENĆOŦEN Latin Charaters

Proposal to Add Four SENĆOŦEN Latin Charaters L2/04-170 Proposal to Add Four SENĆOŦEN Latin Charaters by: John Elliot, Peter Brand, and Chris Harvey of: Saanich Native Heritage Society and First Peoples' Cultural Foundation Date: May 5, 2004 The SENĆOŦEN

More information

John H. Jenkins If available now, identify source(s) for the font (include address, , ftp-site, etc.) and indicate the tools used:

John H. Jenkins If available now, identify source(s) for the font (include address,  , ftp-site, etc.) and indicate the tools used: ISO/IEC JTC 1/SC 2/WG 2 PROPOSAL SUMMARY FORM TO ACCOMPANY SUBMISSIONS FOR ADDITIONS TO THE REPERTOIRE OF ISO/IEC 10646 1 Please fill all the sections A, B and C below. Please read Principles and Procedures

More information

Proposal to encode Kannada Sign Spacing Candrabindu

Proposal to encode Kannada Sign Spacing Candrabindu Proposal to encode Kannada Sign Spacing Candrabindu Vinodh Rajan vrs3@st-andrews.ac.uk Introduction The Badaga language is a minority language in the Indian state of Tamil Nadu. It is spoken by approximately

More information

1 ISO/IEC JTC1/SC2/WG2 N

1 ISO/IEC JTC1/SC2/WG2 N 1 ISO/IEC JTC1/SC2/WG2 N2816 2004-06-18 Universal Multiple Octet Coded Character Set International Organization for Standardization Organisation internationale de normalisation ISO/IEC JTC 1/SC 2/WG 2

More information

Proposal to Encode An Outstanding Early Cyrillic Character in Unicode

Proposal to Encode An Outstanding Early Cyrillic Character in Unicode POMAR PROJECT Proposal to Encode An Outstanding Early Cyrillic Character in Unicode Aleksandr Andreev, Yuri Shardt, Nikita Simmons In early Cyrillic printed editions and manuscripts one finds many combining

More information