The Unicode Standard Version 6.1 Core Specification

Size: px
Start display at page:

Download "The Unicode Standard Version 6.1 Core Specification"

Transcription

1 The Unicode Standard Version 6.1 Core Specification To learn about the latest version of the Unicode Standard, see Many of the designations used by manufacturers and sellers to distinguish their products are claimed as trademarks. Where those designations appear in this book, and the publisher was aware of a trademark claim, the designations have been printed with initial capital letters or in all capitals. Unicode and the Unicode Logo are registered trademarks of Unicode, Inc., in the United States and other countries. The authors and publisher have taken care in the preparation of this specification, but make no expressed or implied warranty of any kind and assume no responsibility for errors or omissions. No liability is assumed for incidental or consequential damages in connection with or arising out of the use of the information or programs contained herein. The Unicode Character Database and other files are provided as-is by Unicode, Inc. No claims are made as to fitness for any particular purpose. No warranties of any kind are expressed or implied. The recipient agrees to determine applicability of information provided. Copyright Unicode, Inc. All rights reserved. This publication is protected by copyright, and permission must be obtained from the publisher prior to any prohibited reproduction. For information regarding permissions, inquire at For information about the Unicode terms of use, please see The Unicode Standard / the Unicode Consortium ; edited by Julie D. Allen... [et al.]. Version 6.1. Includes bibliographical references and index. ISBN ( 1. Unicode (Computer character set) I. Allen, Julie D. II. Unicode Consortium. QA268.U ISBN Published in Mountain View, CA April 2012

2 Figures Figure 1-1. Wide ASCII Figure 1-2. Unicode Compared to the 2022 Framework Figure 2-1. Text Elements and Characters Figure 2-2. Characters Versus Glyphs Figure 2-3. Unicode Character Code to Rendered Glyphs Figure 2-4. Bidirectional Ordering Figure 2-5. Writing Direction and Numbers Figure 2-6. Typeface Variation for the Bone Character Figure 2-7. Dynamic Composition Figure 2-8. Abstract and Encoded Characters Figure 2-9. Overlap in Legacy Mixed-Width Encodings Figure Boundaries and Interpretation Figure Unicode Encoding Forms Figure Unicode Encoding Schemes Figure Unicode Allocation Figure Allocation on the BMP Figure Allocation on Plane Figure Writing Directions Figure Combining Enclosing Marks for Symbols Figure Sequence of Base Characters and Diacritics Figure Reordered Indic Vowel Signs Figure Properties and Combining Character Sequences Figure Stacking Sequences Figure Ligated Multiple Base Characters Figure Equivalent Sequences Figure Canonical Ordering Figure Types of Decomposables Figure 3-1. Enclosing Marks Figure 4-1. Positions of Common Combining Marks Figure 5-1. Two-Stage Tables Figure 5-2. Normalization Figure 5-3. Consistent Character Boundaries Figure 5-4. Dead Keys Versus Handwriting Sequence Figure 5-5. Truncating Grapheme Clusters Figure 5-6. Inside-Out Rule Figure 5-7. Fallback Rendering Figure 5-8. Bidirectional Placement Figure 5-9. Justification Figure Positioning with Ligatures Figure Positioning with Contextual Forms Figure Positioning with Enhanced Kerning Figure Sublinear Searching Figure Uppercase Mapping for Turkish I Figure Lowercase Mapping for Turkish I Figure Casing of German Sharp S Figure 6-1. Overriding Inherent Vowels Figure 6-2. Forms of CJK Punctuation Figure 6-3. European Quotation Marks

3 xviii Figures Figure 6-4. Asian Quotation Marks Figure 6-5. Examples of Ancient Greek Editorial Marks Figure 6-6. Use of Greek Paragraphos Figure 6-7. CJK Parentheses Figure 7-1. Alternative Glyphs in Latin Figure 7-2. Diacritics on i and j Figure 7-3. Vietnamese Letters and Tone Marks Figure 7-4. Variations in Greek Capital Letter Upsilon Figure 7-5. Coptic Numerals Figure 7-6. Georgian Scripts and Casing Figure 7-7. Tone Letters Figure 7-8. Double Diacritics Figure 7-9. Positioning of Double Diacritics Figure Use of CGJ with Double Diacritics Figure Interaction of Combining Marks with Ligatures Figure Use of Vertical Line Overlay for Negation Figure Double Diacritics and Half Marks Figure 8-1. Directionality and Cursive Connection Figure 8-2. Using a Joiner Figure 8-3. Using a Non-joiner Figure 8-4. Combinations of Joiners and Non-joiners Figure 8-5. Placement of Harakat Figure 8-6. Arabic Year Sign Figure 8-7. Syriac Abbreviation Figure 8-8. Use of SAM Figure 9-1. Dead Consonants in Devanagari Figure 9-2. Conjunct Formations in Devanagari Figure 9-3. Preventing Conjunct Forms in Devanagari Figure 9-4. Half-Consonants in Devanagari Figure 9-5. Independent Half-Forms in Devanagari Figure 9-6. Half-Consonants in Oriya Figure 9-7. Consonant Forms in Devanagari and Oriya Figure 9-8. Rendering Order in Devanagari Figure 9-9. Marathi Allographs Figure Use of Apostrophe in Bodo, Dogri and Maithili Figure Use of Avagraha in Dogri Figure Requesting Bengali Consonant-Vowel Ligature Figure Blocking Bengali Consonant-Vowel Ligature Figure Bengali Syllable tta Figure Kssa Ligature in Tamil Figure Tamil Two-Part Vowels Figure Vowel Reordering Around a Tamil Conjunct Figure Tamil Ligatures with i Figure Spacing Forms of Tamil u Figure Tamil Ligatures with ra Figure Traditional Tamil Ligatures with aa Figure Traditional Tamil Ligatures with o Figure Traditional Tamil Ligatures with ai Figure Vowel ai in Modern Tamil Figure Tibetan Syllable Structure Figure Justifying Tibetan Tseks Figure Phags-pa Syllable Om Figure Phags-pa Reversed Shaping Figure Geographical Extent of the Kharoshthi Script

4 Figures xix Figure Kharoshthi Number Figure Kharoshthi Rendering Example Figure Consonant Ligatures in Brahmi Figure Common Ligatures in Khmer Figure Common Multiple Forms in Khmer Figure Examples of Syllabic Order in Khmer Figure Ligation in Muul Style in Khmer Figure Buginese Ligature Figure Writing dharma in Balinese Figure Representation of Javanese Two-Part Vowels Figure Han Spelling Figure Semantic Context for Han Characters Figure Three-Dimensional Conceptual Model Figure CJK Source Separation Figure Not Cognates, Not Unified Figure Ideographic Component Structure Figure The Most Superior Node of an Ideographic Component Figure Using the Ideographic Description Characters Figure Japanese Historic Kana for e and ye Figure Mongolian Glyph Convergence Figure Mongolian Consonant Ligation Figure Mongolian Positional Forms Figure Mongolian Free Variation Selector Figure Mongolian Gender Forms Figure Mongolian Vowel Separator Figure Tifinagh Contextual Shaping Figure Tifinagh Consonant Joiner and Bi-consonants Figure Examples of N Ko Ordinals Figure Short Words Equivalent to Deseret Letter Names Figure Distribution of Old Italic Figure Interpretion of Hieroglyphic Markup Figure Alternative Glyphs for Dollar Sign Figure Alternative Glyphs for Numero Sign Figure Wide Mathematical Accents Figure Style Variants and Semantic Distinctions in Mathematics Figure Easily Confused Shapes for Mathematical Glyphs Figure CJK Ideographic Numbers Figure Regular and Old Style Digits Figure Alternate Forms of Vulgar Fractions Figure Usage of Crops and Quine Corners Figure Usage of the Decimal Exponent Symbol Figure Examples of Specialized Music Layout Figure Precomposed Note Characters Figure Alternative Noteheads Figure Augmentation Dots and Articulation Symbols Figure Prevention of Joining Figure Exhibition of Joining Glyphs in Isolation Figure Effect of Intervening Joiners Figure Annotation Characters Figure Tag Characters Figure CJK Chart Format for the Main CJK Block Figure CJK Chart Format for CJK Extension A Figure CJK Chart Format for CJK Extension B Figure CJK Chart Format for Compatibility Ideographs

5 xx Figures Figure A-1. Example of Rendering

The Unicode Standard Version 10.0 Core Specification

The Unicode Standard Version 10.0 Core Specification The Unicode Standard Version 10.0 Core Specification To learn about the latest version of the Unicode Standard, see http://www.unicode.org/versions/latest/. Many of the designations used by manufacturers

More information

The Unicode Standard Version 10.0 Core Specification

The Unicode Standard Version 10.0 Core Specification The Unicode Standard Version 10.0 Core Specification To learn about the latest version of the Unicode Standard, see http://www.unicode.org/versions/latest/. Many of the designations used by manufacturers

More information

The Unicode Standard Version 12.0 Core Specification

The Unicode Standard Version 12.0 Core Specification The Unicode Standard Version 12.0 Core Specification To learn about the latest version of the Unicode Standard, see http://www.unicode.org/versions/latest/. Many of the designations used by manufacturers

More information

The Unicode Standard Version 11.0 Core Specification

The Unicode Standard Version 11.0 Core Specification The Unicode Standard Version 11.0 Core Specification To learn about the latest version of the Unicode Standard, see http://www.unicode.org/versions/latest/. Many of the designations used by manufacturers

More information

The Unicode Standard Version 11.0 Core Specification

The Unicode Standard Version 11.0 Core Specification The Unicode Standard Version 11.0 Core Specification To learn about the latest version of the Unicode Standard, see http://www.unicode.org/versions/latest/. Many of the designations used by manufacturers

More information

Unicode definition list

Unicode definition list abstract character D3 3.3 2 abstract character sequence D4 3.3 2 accent mark alphabet alphabetic property 4.10 2 alphabetic sorting annotation ANSI Arabic digit 1 Arabic-Indic digit 3.12 1 ASCII assigned

More information

The Unicode Standard Version 12.0 Core Specification

The Unicode Standard Version 12.0 Core Specification The Unicode Standard Version 12.0 Core Specification To learn about the latest version of the Unicode Standard, see http://www.unicode.org/versions/latest/. Many of the designations used by manufacturers

More information

This PDF file is an excerpt from The Unicode Standard, Version 4.0, issued by the Unicode Consortium and published by Addison-Wesley.

This PDF file is an excerpt from The Unicode Standard, Version 4.0, issued by the Unicode Consortium and published by Addison-Wesley. This PDF file is an excerpt from The Unicode Standard, Version 4.0, issued by the Unicode Consortium and published by Addison-Wesley. The material has been modified slightly for this online edition, however

More information

Introduction 1. Chapter 1

Introduction 1. Chapter 1 This PDF file is an excerpt from The Unicode Standard, Version 5.2, issued and published by the Unicode Consortium. The PDF files have not been modified to reflect the corrections found on the Updates

More information

The Unicode Standard Version 6.1 Core Specification

The Unicode Standard Version 6.1 Core Specification The Unicode Standard Version 6.1 Core Specification To learn about the latest version of the Unicode Standard, see http://www.unicode.org/versions/latest/. Many of the designations used by manufacturers

More information

(URW) ++ UNICODE APERÇU 1. Nimbus Sans Block Name. Regular. Bold. Light Vers Regular. Regular. Bold. Medium. Vers Vers Vers. 4.

(URW) ++ UNICODE APERÇU 1. Nimbus Sans Block Name. Regular. Bold. Light Vers Regular. Regular. Bold. Medium. Vers Vers Vers. 4. UNICODE APERÇU 1 Unicode Code points (Plane, Plane 2) 93+9 HKSCS Alternates 8498 8498 31 425 1 Latin Extended-A 5 U+2FF U+52F U+4FF U+F U+5 U+5FF U+7 U+74F U+6FF U+77F U+7 U+7BF U+ U+97F U+7FF U+9FF U+A7F

More information

General Structure 2. Chapter Architectural Context

General Structure 2. Chapter Architectural Context This PDF file is an excerpt from The Unicode Standard, Version 5.2, issued and published by the Unicode Consortium. The PDF files have not been modified to reflect the corrections found on the Updates

More information

The Unicode Standard Version 7.0 Core Specification

The Unicode Standard Version 7.0 Core Specification The Unicode Standard Version 7.0 Core Specification To learn about the latest version of the Unicode Standard, see http://www.unicode.org/versions/latest/. Many of the designations used by manufacturers

More information

The Unicode Standard Version 10.0 Core Specification

The Unicode Standard Version 10.0 Core Specification The Unicode Standard Version 10.0 Core Specification To learn about the latest version of the Unicode Standard, see http://www.unicode.org/versions/latest/. Many of the designations used by manufacturers

More information

The Unicode Standard Version 11.0 Core Specification

The Unicode Standard Version 11.0 Core Specification The Unicode Standard Version 11.0 Core Specification To learn about the latest version of the Unicode Standard, see http://www.unicode.org/versions/latest/. Many of the designations used by manufacturers

More information

Title: Graphic representation of the Roadmap to the BMP of the UCS

Title: Graphic representation of the Roadmap to the BMP of the UCS ISO/IEC JTC1/SC2/WG2 N2045 Title: Graphic representation of the Roadmap to the BMP of the UCS Source: Ad hoc group on Roadmap Status: Expert contribution Date: 1999-08-15 Action: For confirmation by ISO/IEC

More information

Title: Graphic representation of the Roadmap to the BMP, Plane 0 of the UCS

Title: Graphic representation of the Roadmap to the BMP, Plane 0 of the UCS ISO/IEC JTC1/SC2/WG2 N2316 Title: Graphic representation of the Roadmap to the BMP, Plane 0 of the UCS Source: Ad hoc group on Roadmap Status: Expert contribution Date: 2001-01-09 Action: For confirmation

More information

Code Charts 17. Chapter Character Names List. Disclaimer

Code Charts 17. Chapter Character Names List. Disclaimer This PDF file is an excerpt from The Unicode Standard, Version 5.2, issued and published by the Unicode Consortium. The PDF files have not been modified to reflect the corrections found on the Updates

More information

The Unicode Standard Version 6.0 Core Specification

The Unicode Standard Version 6.0 Core Specification The Unicode Standard Version 6.0 Core Specification To learn about the latest version of the Unicode Standard, see http://www.unicode.org/versions/latest/. Many of the designations used by manufacturers

More information

This PDF file is an excerpt from The Unicode Standard, Version 4.0, issued by the Unicode Consortium and published by Addison-Wesley.

This PDF file is an excerpt from The Unicode Standard, Version 4.0, issued by the Unicode Consortium and published by Addison-Wesley. This PDF file is an excerpt from The Unicode Standard, Version 4.0, issued by the Unicode Consortium and published by Addison-Wesley. The material has been modified slightly for this online edition, however

More information

Thu Jun :48:11 Canada/Eastern

Thu Jun :48:11 Canada/Eastern Roadmaps to Unicode Thu Jun 24 2004 17:48:11 Canada/Eastern Home Site Map Search Tables Roadmap Introduction Roadmap to the BMP (Plane 0) Roadmap to the SMP (Plane 1) Roadmap to the SIP (Plane 2) Roadmap

More information

The Unicode Standard Version 11.0 Core Specification

The Unicode Standard Version 11.0 Core Specification The Unicode Standard Version 11.0 Core Specification To learn about the latest version of the Unicode Standard, see http://www.unicode.org/versions/latest/. Many of the designations used by manufacturers

More information

This PDF file is an excerpt from The Unicode Standard, Version 4.0, issued by the Unicode Consortium and published by Addison-Wesley.

This PDF file is an excerpt from The Unicode Standard, Version 4.0, issued by the Unicode Consortium and published by Addison-Wesley. This PDF file is an excerpt from The Unicode Standard, Version 4.0, issued by the Unicode Consortium and published by Addison-Wesley. The material has been modified slightly for this online edition, however

More information

UNICODE SCRIPT NAMES PROPERTY

UNICODE SCRIPT NAMES PROPERTY 1 of 10 1/29/2008 10:29 AM Technical Reports Proposed Update to Unicode Standard Annex #24 UNICODE SCRIPT NAMES PROPERTY Version Unicode 5.1.0 draft2 Authors Mark Davis (mark.davis@google.com), Ken Whistler

More information

Proposal on Handling Reph in Gurmukhi and Telugu Scripts

Proposal on Handling Reph in Gurmukhi and Telugu Scripts Proposal on Handling Reph in Gurmukhi and Telugu Scripts Nagarjuna Venna August 1, 2006 1 Introduction Chapter 9 of the Unicode standard [1] describes the representational model for encoding Indic scripts.

More information

Andrew Glass and Shriramana Sharma. anglass-at-microsoft-dot-com jamadagni-at-gmail-dot-com November-2

Andrew Glass and Shriramana Sharma. anglass-at-microsoft-dot-com jamadagni-at-gmail-dot-com November-2 Proposal to encode 1107F BRAHMI NUMBER JOINER (REVISED) Andrew Glass and Shriramana Sharma anglass-at-microsoft-dot-com jamadagni-at-gmail-dot-com 1. Background 2011-vember-2 In their Brahmi proposal L2/07-342

More information

The Unicode Standard Version 11.0 Core Specification

The Unicode Standard Version 11.0 Core Specification The Unicode Standard Version 11.0 Core Specification To learn about the latest version of the Unicode Standard, see http://www.unicode.org/versions/latest/. Many of the designations used by manufacturers

More information

2011 Martin v. Löwis. Data-centric XML. Character Sets

2011 Martin v. Löwis. Data-centric XML. Character Sets Data-centric XML Character Sets Character Sets: Rationale Computer stores data in sequences of bytes each byte represents a value in range 0..255 Text data are intended to denote characters, not numbers

More information

2007 Martin v. Löwis. Data-centric XML. Character Sets

2007 Martin v. Löwis. Data-centric XML. Character Sets Data-centric XML Character Sets Character Sets: Rationale Computer stores data in sequences of bytes each byte represents a value in range 0..255 Text data are intended to denote characters, not numbers

More information

To the BMP and beyond!

To the BMP and beyond! To the BMP and beyond! Eric Muller Adobe Systems Adobe Systems - To the BMP and beyond! July 20, 2006 - Slide 1 Content 1. Why Unicode 2. Character model 3. Principles of the Abstract Character Set 4.

More information

ISO/IEC JTC 1/SC 2 N 3426

ISO/IEC JTC 1/SC 2 N 3426 ISO/IEC JTC 1/SC 2 N 3426 Date: 2000-04-04 Supersedes SC 2 N 2830 ISO/IEC JTC 1/SC 2 CODED CHARACTER SETS SECRETARIAT: JAPAN (JISC) DOC TYPE: TITLE: Other document Graphic representation of the Roadmap

More information

Proposed Update. Unicode Standard Annex #11

Proposed Update. Unicode Standard Annex #11 1 of 12 5/8/2010 9:14 AM Technical Reports Proposed Update Unicode Standard Annex #11 Version Unicode 6.0.0 draft 2 Authors Asmus Freytag (asmus@unicode.org) Date 2010-03-04 This Version Previous http://www.unicode.org/reports/tr11/tr11-19.html

More information

This PDF file is an excerpt from The Unicode Standard, Version 4.0, issued by the Unicode Consortium and published by Addison-Wesley.

This PDF file is an excerpt from The Unicode Standard, Version 4.0, issued by the Unicode Consortium and published by Addison-Wesley. This PDF file is an excerpt from The Unicode Standard, Version 4.0, issued by the Unicode Consortium and published by Addison-Wesley. The material has been modified slightly for this online edition, however

More information

Google Search Appliance

Google Search Appliance Google Search Appliance Search Appliance Internationalization Google Search Appliance software version 7.2 and later Google, Inc. 1600 Amphitheatre Parkway Mountain View, CA 94043 www.google.com GSA-INTL_200.01

More information

Proposed Update Unicode Standard Annex #11 EAST ASIAN WIDTH

Proposed Update Unicode Standard Annex #11 EAST ASIAN WIDTH Page 1 of 10 Technical Reports Proposed Update Unicode Standard Annex #11 EAST ASIAN WIDTH Version Authors Summary This annex presents the specifications of an informative property for Unicode characters

More information

The Unicode Standard Version 6.2 Core Specification

The Unicode Standard Version 6.2 Core Specification The Unicode Standard Version 6.2 Core Specification To learn about the latest version of the Unicode Standard, see http://www.unicode.org/versions/latest/. Many of the designations used by manufacturers

More information

FLT: Font Layout Table

FLT: Font Layout Table FLT: Font Layout Table Kenichi Handa, Mikiko Nishikimi, Naoto Takahashi and Satoru Tomura Abstract Rendering a complex text such as one written in Indic scripts, or Complex Text Layout requires many kinds

More information

Multilingual mathematical e-document processing

Multilingual mathematical e-document processing Multilingual mathematical e-document processing Azzeddine LAZREK University Cadi Ayyad, Faculty of Sciences Department of Computer Science Marrakech - Morocco lazrek@ucam.ac.ma http://www.ucam.ac.ma/fssm/rydarab

More information

Kannada 2. L2/ Representation of Jihvamuliya and Upadhmaniya in Kannada Srinidhi

Kannada 2. L2/ Representation of Jihvamuliya and Upadhmaniya in Kannada Srinidhi TO: UTC L2/14 XXX FROM: Deborah Anderson, Ken Whistler, Rick McGowan, Roozbeh Pournader, and Laurentiu Iancu SUBJECT: Recommendations to UTC #138 February 2014 on Script Proposals DATE: 26 January 2014

More information

The Unicode Standard. Version 3.0. The Unicode Consortium ADDISON-WESLEY. An Imprint of Addison Wesley Longman, Inc.

The Unicode Standard. Version 3.0. The Unicode Consortium ADDISON-WESLEY. An Imprint of Addison Wesley Longman, Inc. The Unicode Standard Version 3.0 The Unicode Consortium ADDISON-WESLEY An Imprint of Addison Wesley Longman, Inc. Reading, Massachusetts Harlow, England Menlo Park, California Berkeley, California Don

More information

Transliteration of Tamil and Other Indic Scripts. Ram Viswanadha Unicode Software Engineer IBM Globalization Center of Competency, California, USA

Transliteration of Tamil and Other Indic Scripts. Ram Viswanadha Unicode Software Engineer IBM Globalization Center of Competency, California, USA Transliteration of Tamil and Other Indic Scripts Ram Viswanadha Unicode Software Engineer IBM Globalization Center of Competency, California, USA Main points of Powerpoint presentation This talk gives

More information

OpenType Font by Harsha Wijayawardhana UCSC

OpenType Font by Harsha Wijayawardhana UCSC OpenType Font by Harsha Wijayawardhana UCSC Introduction The OpenType font format is an extension of the TrueType font format, adding support for PostScript font data. The OpenType font format was developed

More information

The Unicode Standard Version 10.0 Core Specification

The Unicode Standard Version 10.0 Core Specification The Unicode Standard Version 10.0 Core Specification To learn about the latest version of the Unicode Standard, see http://www.unicode.org/versions/latest/. Many of the designations used by manufacturers

More information

Unicode: What is it and how do I use it?

Unicode: What is it and how do I use it? Abstract: The rationale for Unicode and its design goals and detailed design principles are presented. The correspondence between Unicode and ISO/IEC 10646 is discussed, the scripts included or planned

More information

Rendering in Dzongkha

Rendering in Dzongkha Rendering in Dzongkha Pema Geyleg Department of Information Technology pema.geyleg@gmail.com Abstract The basic layout engine for Dzongkha script was created with the help of Mr. Karunakar. Here the layout

More information

Consent docket re WG2 Resolutions at its Meeting #35 as amended. For the complete text of Resolutions of WG2 Meeting #35, see L2/98-306R.

Consent docket re WG2 Resolutions at its Meeting #35 as amended. For the complete text of Resolutions of WG2 Meeting #35, see L2/98-306R. L2/98-389R Consent docket re WG2 Resolutions at its Meeting #35 as amended For the complete text of Resolutions of WG2 Meeting #35, see L2/98-306R. RESOLUTION M35.4 (PDAM-24 on Thaana): Unanimous to prepare

More information

General Structure 2. Chapter Architectural Context

General Structure 2. Chapter Architectural Context Chapter 2 General Structure 2 This chapter discusses the fundamental principles governing the design of the Unicode Standard and presents an informal overview of its main features. The chapter starts by

More information

This PDF file is an excerpt from The Unicode Standard, Version 4.0, issued by the Unicode Consortium and published by Addison-Wesley.

This PDF file is an excerpt from The Unicode Standard, Version 4.0, issued by the Unicode Consortium and published by Addison-Wesley. This PDF file is an excerpt from The Unicode Standard, Version 4.0, issued by the Unicode Consortium and published by Addison-Wesley. The material has been modified slightly for this online edition, however

More information

JAVA.LANG.CHARACTER.UNICODEBLOCK CLASS

JAVA.LANG.CHARACTER.UNICODEBLOCK CLASS JAVA.LANG.CHARACTER.UNICODEBLOCK CLASS http://www.tutorialspoint.com/java/lang/java_lang_character.unicodehtm Copyright tutorialspoint.com Introduction The java.lang.character.unicodeblock class is a family

More information

Proposed Update Unicode Standard Annex #34

Proposed Update Unicode Standard Annex #34 Technical Reports Proposed Update Unicode Standard Annex #34 Version Unicode 6.3.0 (draft 1) Editors Addison Phillips Date 2013-03-29 This Version Previous Version Latest Version Latest Proposed Update

More information

Unicode and Standardized Notation. Anthony Aristar

Unicode and Standardized Notation. Anthony Aristar Data Management and Archiving University of California at Santa Barbara, June 24-27, 2008 Unicode and Standardized Notation Anthony Aristar Once upon a time There were people who decided to invent computers.

More information

Information technology Universal Multiple-Octet Coded Character Set (UCS) AMENDMENT 2: N Ko, Phags-pa, Phoenician and other characters

Information technology Universal Multiple-Octet Coded Character Set (UCS) AMENDMENT 2: N Ko, Phags-pa, Phoenician and other characters Information technology Universal Multiple-Octet Coded Character Set (UCS) AMENDMENT 2: N Ko, Phags-pa, Phoenician and other characters Page 1, Clause 1 Scope In the note, update the Unicode Standard version

More information

RomanCyrillic Std v. 7

RomanCyrillic Std v. 7 https://doi.org/10.20378/irbo-52591 RomanCyrillic Std v. 7 Online Documentation incl. support for Unicode v. 9, 10, and 11 (2016 2018) UNi code A З PDF! Ѿ Sebastian Kempgen 2018 RomanCyrillic Std: new

More information

Request for encoding GRANTHA LENGTH MARK

Request for encoding GRANTHA LENGTH MARK Request for encoding 11355 GRANTHA LENGTH MARK Shriramana Sharma jamadagni-at-gmail-dot-com 2009-Oct-25 This is a request for encoding a character in the Grantha block. While I have only recently submitted

More information

COSC 243 (Computer Architecture)

COSC 243 (Computer Architecture) COSC 243 Computer Architecture And Operating Systems 1 Dr. Andrew Trotman Instructors Office: 123A, Owheo Phone: 479-7842 Email: andrew@cs.otago.ac.nz Dr. Zhiyi Huang (course coordinator) Office: 126,

More information

Because of these dispositions, Ireland changed its vote to Yes, leaving only one Negative vote (Japan)

Because of these dispositions, Ireland changed its vote to Yes, leaving only one Negative vote (Japan) ISO/IEC JTC1/SC2/WG2 N 4871 Date: 2017-09-28 ISO/IEC JTC1/SC2/WG2 Coded Character Set Secretariat: Japan (JISC) Doc. Type: Disposition of comments Title: Disposition of comments on PDAM2 to ISO/IEC 10646

More information

Proposal to encode Devanagari Sign High Spacing Dot

Proposal to encode Devanagari Sign High Spacing Dot Proposal to encode Devanagari Sign High Spacing Dot Jonathan Kew, Steve Smith SIL International April 20, 2006 1. Introduction In several language communities of Nepal, the Devanagari script has been adapted

More information

ISO/TC46/SC4/WG1 N 240, ISO/TC46/SC4/WG1 N

ISO/TC46/SC4/WG1 N 240, ISO/TC46/SC4/WG1 N L2/00-220 Title: Finalized Mapping between Characters of ISO 5426 and ISO/IEC 10646-1 (UCS) Source: The Research Libraries Group, Inc. Status: L2 Member Contribution References: ISO/TC46/SC4/WG1 N 240,

More information

****This proposal has not been submitted**** ***This document is displayed for initial feedback only*** ***This proposal is currently incomplete***

****This proposal has not been submitted**** ***This document is displayed for initial feedback only*** ***This proposal is currently incomplete*** 1 of 5 3/3/2003 1:25 PM ****This proposal has not been submitted**** ***This document is displayed for initial feedback only*** ***This proposal is currently incomplete*** ISO INTERNATIONAL ORGANIZATION

More information

The Unicode Standard Version 9.0 Core Specification

The Unicode Standard Version 9.0 Core Specification The Unicode Standard Version 9.0 Core Specification To learn about the latest version of the Unicode Standard, see http://www.unicode.org/versions/latest/. Many of the designations used by manufacturers

More information

Domain Names in Pakistani Languages. IDNs for Pakistani Languages

Domain Names in Pakistani Languages. IDNs for Pakistani Languages ا ہ 6 5 a ز @ ں ب Domain Names in Pakistani Languages س a ی س a ب او اور را < ہ ر @ س a آف ا ر ا 6 ب 1 Domain name Domain name is the address of the web page pg on which the content is located 2 Internationalized

More information

Proposals For Devanagari, Gurmukhi, And Gujarati Scripts Root Zone Label Generation Rules

Proposals For Devanagari, Gurmukhi, And Gujarati Scripts Root Zone Label Generation Rules Proposals For Devanagari, Gurmukhi, And Gujarati Scripts Root Zone Label Generation Rules Publication Date: 20 October 2018 Prepared By: IDN Program, ICANN Org Public Comment Proceeding Open Date: 27 July

More information

The Unicode Standard Version 6.0 Core Specification

The Unicode Standard Version 6.0 Core Specification The Unicode Standard Version 6.0 Core Specification To learn about the latest version of the Unicode Standard, see http://www.unicode.org/versions/latest/. Many of the designations used by manufacturers

More information

Two distinct code points: DECIMAL SEPARATOR and FULL STOP

Two distinct code points: DECIMAL SEPARATOR and FULL STOP Two distinct code points: DECIMAL SEPARATOR and FULL STOP Dario Schiavon, 207-09-08 Introduction Unicode, being an extension of ASCII, inherited a great historical mistake, namely the use of the same code

More information

The Unicode Standard Version 11.0 Core Specification

The Unicode Standard Version 11.0 Core Specification The Unicode Standard Version 11.0 Core Specification To learn about the latest version of the Unicode Standard, see http://www.unicode.org/versions/latest/. Many of the designations used by manufacturers

More information

Glossary. The Unicode Standard

Glossary. The Unicode Standard G Abstract Character. A unit of information used for the organization, control, or representation of textual data. (See Definition D3 in Section 3.3, Characters and Coded Representations.) Accent Mark.

More information

This PDF file is an excerpt from The Unicode Standard, Version 4.0, issued by the Unicode Consortium and published by Addison-Wesley.

This PDF file is an excerpt from The Unicode Standard, Version 4.0, issued by the Unicode Consortium and published by Addison-Wesley. This PDF file is an excerpt from The Unicode Standard, Version 4.0, issued by the Unicode Consortium and published by Addison-Wesley. The material has been modified slightly for this online edition, however

More information

Google 1 April A Generalized Unified Character Code: Western European and CJK Sections

Google 1 April A Generalized Unified Character Code: Western European and CJK Sections Network Working Group Request for Comments: 5242 Category: Informational J. Klensin H. Alvestrand Google 1 April 2008 A Generalized Unified Character Code: Western European and CJK Sections Status of This

More information

1. Introduction 2. TAMIL DIGIT ZERO JTC1/SC2/WG2 N Character proposed in this document About INFITT and INFITT WG

1. Introduction 2. TAMIL DIGIT ZERO JTC1/SC2/WG2 N Character proposed in this document About INFITT and INFITT WG JTC1/SC2/WG2 N2741 Dated: February 1, 2004 Title: Proposal to add Tamil Digit Zero (DRAFT) Source: International Forum for Information Technology in Tamil (INFITT) Action: For consideration by UTC and

More information

Multimedia Data. Multimedia Data. Text Vector Graphics 3-D Vector Graphics. Raster Graphics Digital Image Voxel. Audio Digital Video

Multimedia Data. Multimedia Data. Text Vector Graphics 3-D Vector Graphics. Raster Graphics Digital Image Voxel. Audio Digital Video Multimedia Data Multimedia Data Text Vector Graphics 3-D Vector Graphics Raster Graphics Digital Image Voxel Audio Digital Video 1 Text There are three types of text that are used to produce pages of documents

More information

The Unicode Standard Version 6.1 Core Specification

The Unicode Standard Version 6.1 Core Specification The Unicode Standard Version 6.1 Core Specification To learn about the latest version of the Unicode Standard, see http://www.unicode.org/versions/latest/. Many of the designations used by manufacturers

More information

JTC1/SC2/WG2 N

JTC1/SC2/WG2 N Universal Multiple-Octet Coded Character Set International Organization for Standardization Organisation Internationale de Normalisation Международная организация по стандартизации Doc Type: Working Group

More information

The Unicode Standard Version 6.0 Core Specification

The Unicode Standard Version 6.0 Core Specification The Unicode Standard Version 6.0 Core Specification To learn about the latest version of the Unicode Standard, see http://www.unicode.org/versions/latest/. Many of the designations used by manufacturers

More information

Template for comments and secretariat observations Date: Document: ISO/IEC 10646:2014 PDAM2

Template for comments and secretariat observations Date: Document: ISO/IEC 10646:2014 PDAM2 Template for s and secretariat observations Date: 014-08-04 Document: ISO/IEC 10646:014 PDAM 1 (3) 4 5 (6) (7) on each submitted GB1 4.3 ed Subclause title incorrectly refers to CJK ideographs. Change

More information

Conformance 3. Chapter Versions of the Unicode Standard

Conformance 3. Chapter Versions of the Unicode Standard This PDF file is an excerpt from The Unicode Standard, Version 5.2, issued and published by the Unicode Consortium. The PDF files have not been modified to reflect the corrections found on the Updates

More information

Request for encoding 1CF4 VEDIC TONE CANDRA ABOVE

Request for encoding 1CF4 VEDIC TONE CANDRA ABOVE JTC1/SC2/WG2 N3844 Request for encoding 1CF4 VEDIC TONE CANDRA ABOVE Shriramana Sharma jamadagni-at-gmail-dot-com 2009-Oct-11 This is a request for encoding a character in the Vedic Extensions block. This

More information

The Unicode Standard Version 10.0 Core Specification

The Unicode Standard Version 10.0 Core Specification The Unicode Standard Version 10.0 Core Specification To learn about the latest version of the Unicode Standard, see http://www.unicode.org/versions/latest/. Many of the designations used by manufacturers

More information

The Unicode Standard Version 6.2 Core Specification

The Unicode Standard Version 6.2 Core Specification The Unicode Standard Version 6.2 Core Specification To learn about the latest version of the Unicode Standard, see http://www.unicode.org/versions/latest/. Many of the designations used by manufacturers

More information

L2/ ISO/IEC JTC 1/SC 2/WG 2 PROPOSAL SUMMARY FORM TO ACCOMPANY SUBMISSIONS FOR ADDITIONS TO THE REPERTOIRE OF ISO/IEC

L2/ ISO/IEC JTC 1/SC 2/WG 2 PROPOSAL SUMMARY FORM TO ACCOMPANY SUBMISSIONS FOR ADDITIONS TO THE REPERTOIRE OF ISO/IEC ISO/IEC JTC 1/SC 2/WG 2 PROPOSAL SUMMARY FORM TO ACCOMPANY SUBMISSIONS FOR ADDITIONS TO THE REPERTOIRE OF ISO/IEC 10646 1 Please fill all the sections A, B and C below. Please read Principles and Procedures

More information

ISO/IEC JTC 1/SC 2/WG 2 Proposal summary form N2652-F accompanies this document.

ISO/IEC JTC 1/SC 2/WG 2 Proposal summary form N2652-F accompanies this document. Dated: April 28, 2006 Title: Proposal to add TAMIL OM Source: International Forum for Information Technology in Tamil (INFITT) Action: For consideration by UTC and ISO/IEC JTC 1/SC 2/WG 2 Distribution:

More information

Using non-latin alphabets in Blaise

Using non-latin alphabets in Blaise Using non-latin alphabets in Blaise Rob Groeneveld, Statistics Netherlands 1. Basic techniques with fonts In the Data Entry Program in Blaise, it is possible to use different fonts. Here, we show an example

More information

1. Introduction 2. TAMIL LETTER SHA Character proposed in this document About INFITT and INFITT WG

1. Introduction 2. TAMIL LETTER SHA Character proposed in this document About INFITT and INFITT WG Dated: September 14, 2003 Title: Proposal to add TAMIL LETTER SHA Source: International Forum for Information Technology in Tamil (INFITT) Action: For consideration by UTC and ISO/IEC JTC 1/SC 2/WG 2 Distribution:

More information

This PDF file is an excerpt from The Unicode Standard, Version 4.0, issued by the Unicode Consortium and published by Addison-Wesley.

This PDF file is an excerpt from The Unicode Standard, Version 4.0, issued by the Unicode Consortium and published by Addison-Wesley. This PDF file is an excerpt from The Unicode Standard, Version 4.0, issued by the Unicode Consortium and published by Addison-Wesley. The material has been modified slightly for this online edition, however

More information

Ꞑ A790 LATIN CAPITAL LETTER A WITH SPIRITUS LENIS ꞑ A791 LATIN SMALL LETTER A WITH SPIRITUS LENIS

Ꞑ A790 LATIN CAPITAL LETTER A WITH SPIRITUS LENIS ꞑ A791 LATIN SMALL LETTER A WITH SPIRITUS LENIS ISO/IEC JTC1/SC2/WG2 N3487 L2/08-272 2008-08-04 Universal Multiple-Octet Coded Character Set International Organization for Standardization Organisation Internationale de Normalisation Международная организация

More information

TECkit version 2.0 A Text Encoding Conversion toolkit

TECkit version 2.0 A Text Encoding Conversion toolkit TECkit version 2.0 A Text Encoding Conversion toolkit Jonathan Kew SIL Non-Roman Script Initiative (NRSI) Abstract TECkit is a toolkit for encoding conversions. It offers a simple format for describing

More information

ISO/IEC INTERNATIONAL STANDARD

ISO/IEC INTERNATIONAL STANDARD INTERNATIONAL STANDARD Provläsningsexemplar / Preview ISO/IEC 10646 First edition 2003-12-15 AMENDMENT 3 2008-02-15 Information technology Universal Multiple-Octet Coded Character Set (UCS) AMENDMENT 3:

More information

Proposal to Encode the Ganda Currency Mark for Bengali in the BMP of the UCS

Proposal to Encode the Ganda Currency Mark for Bengali in the BMP of the UCS Proposal to Encode the Ganda Currency Mark for Bengali in the BMP of the UCS University of Michigan Ann Arbor, Michigan, U.S.A. pandey@umich.edu May 21, 2007 1 Introduction This is a proposal to encode

More information

Draft. Unicode Technical Report #49

Draft. Unicode Technical Report #49 1 of 9 Technical Reports Draft Unicode Technical Report #49 Editors Ken Whistler Date 2011-07-12 This Version http://www.unicode.org/reports/tr49/tr49-2.html Previous Version http://www.unicode.org/reports/tr49/tr49-1.html

More information

The Adobe-CNS1-6 Character Collection

The Adobe-CNS1-6 Character Collection Adobe Enterprise & Developer Support Adobe Technical Note # bc The Adobe-CNS- Character Collection Introduction The purpose of this document is to define and describe the Adobe-CNS- character collection,

More information

1 ISO/IEC JTC1/SC2/WG2 N

1 ISO/IEC JTC1/SC2/WG2 N 1 ISO/IEC JTC1/SC2/WG2 N2816 2004-06-18 Universal Multiple Octet Coded Character Set International Organization for Standardization Organisation internationale de normalisation ISO/IEC JTC 1/SC 2/WG 2

More information

Proposal to Encode Oriya Fraction Signs in ISO/IEC 10646

Proposal to Encode Oriya Fraction Signs in ISO/IEC 10646 Proposal to Encode Oriya Fraction Signs in ISO/IEC 0646 University of Michigan Ann Arbor, Michigan, U.S.A. pandey@umich.edu December 4, 2007 Contents Proposal Summary Form i Introduction 2 Characters Proposed

More information

draft-hoffman-i18n-terms-02.txt July 18, 2001 Expires in six months Terminology Used in Internationalization in the IETF Status of this memo

draft-hoffman-i18n-terms-02.txt July 18, 2001 Expires in six months Terminology Used in Internationalization in the IETF Status of this memo Internet Draft draft-hoffman-i18n-terms-02.txt July 18, 2001 Expires in six months Paul Hoffman IMC & VPNC Status of this memo Terminology Used in Internationalization in the IETF This document is an Internet-Draft

More information

Proposal to encode three Arabic characters for Arwi

Proposal to encode three Arabic characters for Arwi Proposal to encode three Arabic characters for Arwi Roozbeh Pournader, Google (roozbeh@google.com) June 24, 2013 Requested action I would like to ask the UTC and the WG2 to encode the following three Arabic

More information

Proposal to encode the DOGRA VOWEL SIGN VOCALIC RR

Proposal to encode the DOGRA VOWEL SIGN VOCALIC RR Proposal to encode the DOGRA VOWEL SIGN VOCALIC RR Srinidhi A and Sridatta A Tumakuru, India srinidhi.pinkpetals24@gmail.com, sridatta.jamadagni@gmail.com June 25, 2017 1 Introduction This is a proposal

More information

Blending Content for South Asian Language Pedagogy Part 2: South Asian Languages on the Internet

Blending Content for South Asian Language Pedagogy Part 2: South Asian Languages on the Internet Blending Content for South Asian Language Pedagogy Part 2: South Asian Languages on the Internet A. Sean Pue South Asia Language Resource Center Pre-SASLI Workshop 6/7/09 1 Objectives To understand how

More information

The Unicode Standard Version 10.0 Core Specification

The Unicode Standard Version 10.0 Core Specification The Unicode Standard Version 10.0 Core Specification To learn about the latest version of the Unicode Standard, see http://www.unicode.org/versions/latest/. Many of the designations used by manufacturers

More information

Designing & Developing Pan-CJK Fonts for Today

Designing & Developing Pan-CJK Fonts for Today Designing & Developing Pan-CJK Fonts for Today Ken Lunde Adobe Systems Incorporated 2009 Adobe Systems Incorporated. All rights reserved. 1 What Is A Pan-CJK Font? A Pan-CJK font includes glyphs suitable

More information

Proposal to encode the SANDHI MARK for Newa

Proposal to encode the SANDHI MARK for Newa Proposal to encode the SANDHI MARK for Newa Srinidhi A and Sridatta A Tumakuru, India srinidhi.pinkpetals24@gmail.com, sridatta.jamadagni@gmail.com December 23, 2016 1 Introduction This is a proposal to

More information

Extensible Rendering for Complex Writing Systems

Extensible Rendering for Complex Writing Systems Extensible Rendering for Complex Writing Systems Sharon Correll SIL International 1 Introduction Those needing to work with multilingual text, particularly using any kind of complex script, commonly run

More information

Structure Vowel signs are used in a manner similar to that employed by other Brahmi-derived scripts. Consonants have an inherent /a/ vowel sound.

Structure Vowel signs are used in a manner similar to that employed by other Brahmi-derived scripts. Consonants have an inherent /a/ vowel sound. ISO/IEC JTC1/SC2/WG2 N3023 L2/06-003 2006-01-11 Universal Multiple-Octet Coded Character Set International Organization for Standardization Organisation Internationale de Normalisation Международная организация

More information