Unicode definition list

Size: px
Start display at page:

Download "Unicode definition list"

Transcription

1 abstract character D abstract character sequence D accent mark alphabet alphabetic property alphabetic sorting annotation ANSI Arabic digit 1 Arabic-Indic digit ASCII assigned character 1 assigned code point 1 assigned code value 1 base character D Basic Multilingual Plane 2 bicameral BIDI bidirectional behavior bidirectional character type BD bidirectional display 1 bidirectional text big-endian binary file block 2 blocked D2 UAX15 2 BMP 2 BMP character 2 BMP code point 2 BNF BOM 2 Bopomofo boustrophedon Braille Braille pattern byte order mark D /7/2001 1

2 byte sequence D byte serialization byte-swapped canonical 2 canonical composite UAX15 2 canonical decomposition D canonical equivalent D canonical ordering cantillation mark capital case case mapping case property D10a 3.4, cedilla character 2 character block 2 character class character encoding form 2 character encoding scheme 2 character property 2 character repertoire 2 character semantics D character sequence 1 character set 1 charset 1 choseong Chu Han Chu Nom CJK 1 CJKV 1 code page 1 code point D code position D code set 1 code unit D code unit sequence D /7/2001 2

3 code value D code value sequence D coded character representation D coded character sequence D coded character set 2 codespace 2 collation combining character D combining character sequence D combining class D , compatibility 2 compatibility character D compatibility composite UAX15 2 compatibility decomposition D compatibility equivalent D compatibility variant 1 composed character sequence no composite character composite character sequence D composition exclusion table UAX15 2 composition version (of the UCD) UAX15 2 conformance conjunct form 2 consonant cluster consonant conjunct contextual variant control character 2 control code 2 cursive DBCS dead consonant 2 decimal digit decomposable character D decomposition D default property 4 2 defective combining character sequence D17a /7/2001 3

4 demotic script dependent vowel 2 deprecated character D7a diacritic diaeresis digraph dingbat diphthong direct break UAX14 2 directional formatting codes directional override status BD directionality directionality property D9 3.4, display cell display order 1 double-byte character set ductility dynamic composition 1 East Asian ambiguous ED6 UAX11 2 East Asian full-width ED2 UAX11 2 East Asian half-width ED3 UAX11 2 East Asian narrow ED5 UAX11 2 East Asian wide ED4 UAX11 2 East Asian width ED1 UAX11 2 Eastern Arabic-Indic digit 1 EBCDIC embedding direction BD embedding level BD encapsulated text encoded character 1 encoding form 2 encoding scheme 2 equivalence 1 escape sequence European digit explicit directional embedding /7/2001 4

5 explicit directional overrides fancy text 2 floating font 1 formatted text 1 formatting codes 1 FSS-UTF 1 full bidirectionality fullwidth UAX11 2 GCGID general category glyph 2 glyph code 1 glyph identifier 1 glyph image 1 glyph metrics 1 grapheme 2 2 graphic character 2 2 guillemet halant half-consonant form 2 halfwidth UAX11 2 Han characters Han Unification 2 Hangul Hangul syllable composition Hangul syllable decomposition Hangul syllable names Hanja hankaku Hanzi harakat higher-level protocol D high-surrogate D Hiragana HTML 2/7/2001 5

6 hyphenation UAX14 2 IANA ideograph ideographic property illegal code unit sequence D no illegal code value sequence D illegal UTF-32 code unit sequence D36b illegal UTF-32BE code unit sequence D36a illegal UTF-32LE code unit sequence D36b ill-formed code unit sequence D no ill-formed code value sequence D implicit bidirectionality implicit directional marks in-band independent vowel 2 Indic digit indirect break UAX14 2 informative 2 informative property 4 2 inherent vowel 2 inner caps IPA IRG irregular code unit sequence D no irregular code value sequence D irregular UTF-32 code unit sequence D36b irregular UTF-32BE code unit sequence D36a irregular UTF-32LE code unit sequence D36b ISCII jamo jamo short name joiner 2 jongseong JTC1 jungseong kana 2/7/2001 6

7 kanji Katakana kerning legal UTF-8 byte sequences letter 1 letter property level run BD ligature 2 line break UAX14 2 line break opportunity UAX14 2 line breaking UAX14 2 line breaking property UAX14 2 line fitting UAX14 2 little-endian logical order 2 logical store lowercase 2 low-surrogate D LSB LZW majuscule mandatory break UAX14 2 mathematical property matra 2 MBCS MIME minuscule mirrored property D10 3.4, missing glyph 1 modifier letter 2 monotonic MSB multibyte character set nekudot neutral (directional) type 3.12, neutral character 2 2/7/2001 7

8 neutral directional character 3.12, NFC UAX15 2 NFD UAX15 2 NFKC UAX15 2 NFKD UAX15 2 noncharacter D7b non-joiner 2 nonspacing diacritic nonspacing mark D non-starter decomposition UAX15 2 normalization 2 normalization form C UAX15 2 normalization form D UAX15 2 normalization form KC UAX15 2 normalization form KD UAX15 2 normative 2 normative properties and behavior D normative property 4 2 NSM numeric property numeric value property D10b 3.4, obsolete out-of-band overfull UAX14 2 paragraph direction BD paragraph embedding level BD phoneme Pinyin pivot conversion plain text 2 plane 2 point polytonic post composition version UAX15 2 precomposed character 1 presentation form 1 2/7/2001 8

9 primary combined (with) D4 UAX15 2 primary composite D3 UAX15 2 private use D prohibited break UAX14 2 property 1 radical rendering 2 repertoire 1 replacement character 2 replacement glyph 1 reserved 1 rich text 1 row 2 SBCS scalar value 2 script 2 script-specific (character) UAX15 2 SGML shaping characters 1 singleton character UAX15 2 small letter sorting spacing mark D special character properties D standard syllable block starter D1 UAX15 2 static form no strong (directional) type 3.12, strong directional character 3.12, supplementary character 2 supplementary code point 2 supplementary plane 2 surrogate character surrogate code point 2 surrogate pair D syllabary 2/7/2001 9

10 syllable syllable block symmetric swapping 1 tagging TEX text element 2 titlecase 2 tone mark transcoding 1 transformation format 2 triangulation UCS 2 UCS transformation format D UCS-2 2 UCS-4 2 umlaut unassigned 1 underfull UAX14 2 unicameral Unicode 1.0 character name Unicode Character Database 2 Unicode scalar value D Unicode sequence identifier Unicode signature 2 Unicode transformation format D unification 1 uppercase 2 URO 2 USI UTF D UTF-16 D UTF-16BE D UTF-16LE D UTF-2 UTF-32 D36b UTF-32BE D36a /7/

11 UTF-32LE D36b UTF-7 UTF-8 D virama visual order 2 vocalization vowel mark wchar_t 1 weak (directional) type 3.12, weak directional character 3.12, writing direction 1 writing system XML zenkaku zero width 1 2/7/

Glossary. The Unicode Standard

Glossary. The Unicode Standard G Abstract Character. A unit of information used for the organization, control, or representation of textual data. (See Definition D3 in Section 3.3, Characters and Coded Representations.) Accent Mark.

More information

This PDF file is an excerpt from The Unicode Standard, Version 4.0, issued by the Unicode Consortium and published by Addison-Wesley.

This PDF file is an excerpt from The Unicode Standard, Version 4.0, issued by the Unicode Consortium and published by Addison-Wesley. This PDF file is an excerpt from The Unicode Standard, Version 4.0, issued by the Unicode Consortium and published by Addison-Wesley. The material has been modified slightly for this online edition, however

More information

2011 Martin v. Löwis. Data-centric XML. Character Sets

2011 Martin v. Löwis. Data-centric XML. Character Sets Data-centric XML Character Sets Character Sets: Rationale Computer stores data in sequences of bytes each byte represents a value in range 0..255 Text data are intended to denote characters, not numbers

More information

2007 Martin v. Löwis. Data-centric XML. Character Sets

2007 Martin v. Löwis. Data-centric XML. Character Sets Data-centric XML Character Sets Character Sets: Rationale Computer stores data in sequences of bytes each byte represents a value in range 0..255 Text data are intended to denote characters, not numbers

More information

The Unicode Standard Version 6.1 Core Specification

The Unicode Standard Version 6.1 Core Specification The Unicode Standard Version 6.1 Core Specification To learn about the latest version of the Unicode Standard, see http://www.unicode.org/versions/latest/. Many of the designations used by manufacturers

More information

General Structure 2. Chapter Architectural Context

General Structure 2. Chapter Architectural Context Chapter 2 General Structure 2 This chapter discusses the fundamental principles governing the design of the Unicode Standard and presents an informal overview of its main features. The chapter starts by

More information

This PDF file is an excerpt from The Unicode Standard, Version 4.0, issued by the Unicode Consortium and published by Addison-Wesley.

This PDF file is an excerpt from The Unicode Standard, Version 4.0, issued by the Unicode Consortium and published by Addison-Wesley. This PDF file is an excerpt from The Unicode Standard, Version 4.0, issued by the Unicode Consortium and published by Addison-Wesley. The material has been modified slightly for this online edition, however

More information

This PDF file is an excerpt from The Unicode Standard, Version 4.0, issued by the Unicode Consortium and published by Addison-Wesley.

This PDF file is an excerpt from The Unicode Standard, Version 4.0, issued by the Unicode Consortium and published by Addison-Wesley. This PDF file is an excerpt from The Unicode Standard, Version 4.0, issued by the Unicode Consortium and published by Addison-Wesley. The material has been modified slightly for this online edition, however

More information

Proposed Update. Unicode Standard Annex #11

Proposed Update. Unicode Standard Annex #11 1 of 12 5/8/2010 9:14 AM Technical Reports Proposed Update Unicode Standard Annex #11 Version Unicode 6.0.0 draft 2 Authors Asmus Freytag (asmus@unicode.org) Date 2010-03-04 This Version Previous http://www.unicode.org/reports/tr11/tr11-19.html

More information

This PDF file is an excerpt from The Unicode Standard, Version 4.0, issued by the Unicode Consortium and published by Addison-Wesley.

This PDF file is an excerpt from The Unicode Standard, Version 4.0, issued by the Unicode Consortium and published by Addison-Wesley. This PDF file is an excerpt from The Unicode Standard, Version 4.0, issued by the Unicode Consortium and published by Addison-Wesley. The material has been modified slightly for this online edition, however

More information

General Structure 2. Chapter Architectural Context

General Structure 2. Chapter Architectural Context This PDF file is an excerpt from The Unicode Standard, Version 5.2, issued and published by the Unicode Consortium. The PDF files have not been modified to reflect the corrections found on the Updates

More information

Proposed Update Unicode Standard Annex #11 EAST ASIAN WIDTH

Proposed Update Unicode Standard Annex #11 EAST ASIAN WIDTH Page 1 of 10 Technical Reports Proposed Update Unicode Standard Annex #11 EAST ASIAN WIDTH Version Authors Summary This annex presents the specifications of an informative property for Unicode characters

More information

The Unicode Standard Version 11.0 Core Specification

The Unicode Standard Version 11.0 Core Specification The Unicode Standard Version 11.0 Core Specification To learn about the latest version of the Unicode Standard, see http://www.unicode.org/versions/latest/. Many of the designations used by manufacturers

More information

This PDF file is an excerpt from The Unicode Standard, Version 4.0, issued by the Unicode Consortium and published by Addison-Wesley.

This PDF file is an excerpt from The Unicode Standard, Version 4.0, issued by the Unicode Consortium and published by Addison-Wesley. This PDF file is an excerpt from The Unicode Standard, Version 4.0, issued by the Unicode Consortium and published by Addison-Wesley. The material has been modified slightly for this online edition, however

More information

To the BMP and beyond!

To the BMP and beyond! To the BMP and beyond! Eric Muller Adobe Systems Adobe Systems - To the BMP and beyond! July 20, 2006 - Slide 1 Content 1. Why Unicode 2. Character model 3. Principles of the Abstract Character Set 4.

More information

The Unicode Standard Version 6.0 Core Specification

The Unicode Standard Version 6.0 Core Specification The Unicode Standard Version 6.0 Core Specification To learn about the latest version of the Unicode Standard, see http://www.unicode.org/versions/latest/. Many of the designations used by manufacturers

More information

The Unicode Standard. Version 3.0. The Unicode Consortium ADDISON-WESLEY. An Imprint of Addison Wesley Longman, Inc.

The Unicode Standard. Version 3.0. The Unicode Consortium ADDISON-WESLEY. An Imprint of Addison Wesley Longman, Inc. The Unicode Standard Version 3.0 The Unicode Consortium ADDISON-WESLEY An Imprint of Addison Wesley Longman, Inc. Reading, Massachusetts Harlow, England Menlo Park, California Berkeley, California Don

More information

The Unicode Standard Version 10.0 Core Specification

The Unicode Standard Version 10.0 Core Specification The Unicode Standard Version 10.0 Core Specification To learn about the latest version of the Unicode Standard, see http://www.unicode.org/versions/latest/. Many of the designations used by manufacturers

More information

The Unicode Standard Version 6.2 Core Specification

The Unicode Standard Version 6.2 Core Specification The Unicode Standard Version 6.2 Core Specification To learn about the latest version of the Unicode Standard, see http://www.unicode.org/versions/latest/. Many of the designations used by manufacturers

More information

The Unicode Standard Version 11.0 Core Specification

The Unicode Standard Version 11.0 Core Specification The Unicode Standard Version 11.0 Core Specification To learn about the latest version of the Unicode Standard, see http://www.unicode.org/versions/latest/. Many of the designations used by manufacturers

More information

$string is used as a string under character semantics (see perlunicode). $codepoint should be an unsigned integer representing a Unicode code point.

$string is used as a string under character semantics (see perlunicode). $codepoint should be an unsigned integer representing a Unicode code point. NAME Unicode::Normalize - Unicode Normalization Forms SYNOPSIS (1) using function names exported by default: use Unicode::Normalize; $NFD_string = NFD($string); # Normalization Form D $NFC_string = NFC($string);

More information

The Unicode Standard Version 9.0 Core Specification

The Unicode Standard Version 9.0 Core Specification The Unicode Standard Version 9.0 Core Specification To learn about the latest version of the Unicode Standard, see http://www.unicode.org/versions/latest/. Many of the designations used by manufacturers

More information

The Unicode Standard Version 11.0 Core Specification

The Unicode Standard Version 11.0 Core Specification The Unicode Standard Version 11.0 Core Specification To learn about the latest version of the Unicode Standard, see http://www.unicode.org/versions/latest/. Many of the designations used by manufacturers

More information

The Unicode Standard Version 10.0 Core Specification

The Unicode Standard Version 10.0 Core Specification The Unicode Standard Version 10.0 Core Specification To learn about the latest version of the Unicode Standard, see http://www.unicode.org/versions/latest/. Many of the designations used by manufacturers

More information

Conformance 3. Chapter Versions of the Unicode Standard

Conformance 3. Chapter Versions of the Unicode Standard This PDF file is an excerpt from The Unicode Standard, Version 5.2, issued and published by the Unicode Consortium. The PDF files have not been modified to reflect the corrections found on the Updates

More information

draft-hoffman-i18n-terms-02.txt July 18, 2001 Expires in six months Terminology Used in Internationalization in the IETF Status of this memo

draft-hoffman-i18n-terms-02.txt July 18, 2001 Expires in six months Terminology Used in Internationalization in the IETF Status of this memo Internet Draft draft-hoffman-i18n-terms-02.txt July 18, 2001 Expires in six months Paul Hoffman IMC & VPNC Status of this memo Terminology Used in Internationalization in the IETF This document is an Internet-Draft

More information

This PDF file is an excerpt from The Unicode Standard, Version 4.0, issued by the Unicode Consortium and published by Addison-Wesley.

This PDF file is an excerpt from The Unicode Standard, Version 4.0, issued by the Unicode Consortium and published by Addison-Wesley. This PDF file is an excerpt from The Unicode Standard, Version 4.0, issued by the Unicode Consortium and published by Addison-Wesley. The material has been modified slightly for this online edition, however

More information

Proposal to enhance the Unicode normalization algorithm

Proposal to enhance the Unicode normalization algorithm Proposal to enhance the Unicode normalization algorithm Date: June 2, 2003 Author: Jonathan Kew, SIL International Address: Horsleys Green High Wycombe Bucks HP14 3XL England Tel: 44 (1494) 682306 Email:

More information

The Unicode Standard Version 10.0 Core Specification

The Unicode Standard Version 10.0 Core Specification The Unicode Standard Version 10.0 Core Specification To learn about the latest version of the Unicode Standard, see http://www.unicode.org/versions/latest/. Many of the designations used by manufacturers

More information

EMu Documentation. Unicode in EMu 5.0. Document Version 1. EMu 5.0

EMu Documentation. Unicode in EMu 5.0. Document Version 1. EMu 5.0 EMu Documentation Unicode in EMu 5.0 Document Version 1 EMu 5.0 Contents SECTION 1 Unicode 1 Overview 1 Code Points 3 Inputting Unicode Characters 6 Graphemes 10 Index Terms 11 SECTION 2 Searching 15

More information

Request for Comments: 3536 Category: Informational May Terminology Used in Internationalization in the IETF

Request for Comments: 3536 Category: Informational May Terminology Used in Internationalization in the IETF Network Working Group P. Hoffman Request for Comments: 3536 IMC & VPNC Category: Informational May 2003 Status of this Memo Terminology Used in Internationalization in the IETF This memo provides information

More information

OpenType Font by Harsha Wijayawardhana UCSC

OpenType Font by Harsha Wijayawardhana UCSC OpenType Font by Harsha Wijayawardhana UCSC Introduction The OpenType font format is an extension of the TrueType font format, adding support for PostScript font data. The OpenType font format was developed

More information

L2/ Title: Summary of proposed changes to EAW classification and documentation From: Asmus Freytag Date:

L2/ Title: Summary of proposed changes to EAW classification and documentation From: Asmus Freytag Date: Title: Summary of proposed changes to EAW classification and documentation From: Asmus Freytag Date: 2002-02-13 L2/02-078 1) Based on a detailed review I carried out, the following are currently supported:

More information

UNICODE CHARACTER ENCODING MODEL

UNICODE CHARACTER ENCODING MODEL 1 of 23 10/23/2008 6:11 PM Technical Reports Proposed Update Unicode Technical Report #17 UNICODE CHARACTER ENCODING MODEL Authors Ken Whistler (ken@unicode.org), Mark Davis (markdavis@google.com), Asmus

More information

The Unicode Standard Version 10.0 Core Specification

The Unicode Standard Version 10.0 Core Specification The Unicode Standard Version 10.0 Core Specification To learn about the latest version of the Unicode Standard, see http://www.unicode.org/versions/latest/. Many of the designations used by manufacturers

More information

$string is used as a string under character semantics (see perlunicode). $code_point should be an unsigned integer representing a Unicode code point.

$string is used as a string under character semantics (see perlunicode). $code_point should be an unsigned integer representing a Unicode code point. NAME Unicode::Normalize - Unicode Normalization Forms SYNOPSIS (1) using function names exported by default: use Unicode::Normalize; $NFD_string = NFD($string); # Normalization Form D $NFC_string = NFC($string);

More information

UNICODE NORMALIZATION FORMS

UNICODE NORMALIZATION FORMS 1 of 31 10/13/2007 1:46 PM Technical Reports Proposed Update to Unicode Standard Annex #15 UNICODE NORMALIZATION FORMS Version Unicode 5.1.0 draft 4 Authors Mark Davis (mark.davis@google.com), Martin Dürst

More information

The Unicode Standard Version 6.1 Core Specification

The Unicode Standard Version 6.1 Core Specification The Unicode Standard Version 6.1 Core Specification To learn about the latest version of the Unicode Standard, see http://www.unicode.org/versions/latest/. Many of the designations used by manufacturers

More information

Code Charts 17. Chapter Character Names List. Disclaimer

Code Charts 17. Chapter Character Names List. Disclaimer This PDF file is an excerpt from The Unicode Standard, Version 5.2, issued and published by the Unicode Consortium. The PDF files have not been modified to reflect the corrections found on the Updates

More information

The Unicode Standard Version 12.0 Core Specification

The Unicode Standard Version 12.0 Core Specification The Unicode Standard Version 12.0 Core Specification To learn about the latest version of the Unicode Standard, see http://www.unicode.org/versions/latest/. Many of the designations used by manufacturers

More information

Unicode Standard Annex #9

Unicode Standard Annex #9 http://www.unicode.org/reports/tr9/tr9-24.html 1 of 30 Technical Reports Unicode Standard Annex #9 Version Unicode 6..0 Editors Date This Version Previous Version Latest Version Latest Proposed Update

More information

The Unicode Standard Version 7.0 Core Specification

The Unicode Standard Version 7.0 Core Specification The Unicode Standard Version 7.0 Core Specification To learn about the latest version of the Unicode Standard, see http://www.unicode.org/versions/latest/. Many of the designations used by manufacturers

More information

Title: Graphic representation of the Roadmap to the BMP of the UCS

Title: Graphic representation of the Roadmap to the BMP of the UCS ISO/IEC JTC1/SC2/WG2 N2045 Title: Graphic representation of the Roadmap to the BMP of the UCS Source: Ad hoc group on Roadmap Status: Expert contribution Date: 1999-08-15 Action: For confirmation by ISO/IEC

More information

draft-ietf-idn-nameprep-07.txt Expires in six months Stringprep Profile for Internationalized Host Names

draft-ietf-idn-nameprep-07.txt Expires in six months Stringprep Profile for Internationalized Host Names Internet Draft draft-ietf-idn-nameprep-07.txt January 9, 2001 Expires in six months Paul Hoffman IMC & VPNC Marc Blanchet ViaGenie Stringprep Profile for Internationalized Host Names Status of this memo

More information

The Unicode Standard Version 6.1 Core Specification

The Unicode Standard Version 6.1 Core Specification The Unicode Standard Version 6.1 Core Specification To learn about the latest version of the Unicode Standard, see http://www.unicode.org/versions/latest/. Many of the designations used by manufacturers

More information

Proposed Update. Unicode Standard Annex #15

Proposed Update. Unicode Standard Annex #15 Technical Reports Proposed Update Unicode Standard Annex #15 Version Unicode 6.3.0 (draft 2) Editors Date 2013-03-25 This Version Previous Version Latest Version Mark Davis (markdavis@google.com), Ken

More information

WG2 N3593. ISO/IEC International Standard ISO/IEC Information technology Universal Coded Character Set (UCS) Working Draft

WG2 N3593. ISO/IEC International Standard ISO/IEC Information technology Universal Coded Character Set (UCS) Working Draft WG2 N3593 ISO/IEC International Standard ISO/IEC 10646 Working Draft Information technology Universal Coded Character Set (UCS) echnologie de l information Jeu universel de caractères codés (JUC) Second

More information

The Unicode Standard Version 11.0 Core Specification

The Unicode Standard Version 11.0 Core Specification The Unicode Standard Version 11.0 Core Specification To learn about the latest version of the Unicode Standard, see http://www.unicode.org/versions/latest/. Many of the designations used by manufacturers

More information

ISO/IEC JTC 1/SC 2 N 3332/WG2 N 2057

ISO/IEC JTC 1/SC 2 N 3332/WG2 N 2057 ISO/IEC JTC 1/SC 2 N 3332/WG2 N 2057 Date: 1999-06-22 ISO/IEC JTC 1/SC 2 CODED CHARACTER SETS SECRETARIAT: JAPAN (JISC) DOC TYPE: TITLE: SOURCE: Other document National Body Comments on SC 2 N 3297, WD

More information

ISO/IEC INTERNATIONAL STANDARD

ISO/IEC INTERNATIONAL STANDARD INTERNATIONAL STANDARD Provläsningsexemplar / Preview ISO/IEC 10646 First edition 2003-12-15 AMENDMENT 3 2008-02-15 Information technology Universal Multiple-Octet Coded Character Set (UCS) AMENDMENT 3:

More information

Chapter 4: Computer Codes. In this chapter you will learn about:

Chapter 4: Computer Codes. In this chapter you will learn about: Ref. Page Slide 1/30 Learning Objectives In this chapter you will learn about: Computer data Computer codes: representation of data in binary Most commonly used computer codes Collating sequence Ref. Page

More information

The Unicode Standard Version 11.0 Core Specification

The Unicode Standard Version 11.0 Core Specification The Unicode Standard Version 11.0 Core Specification To learn about the latest version of the Unicode Standard, see http://www.unicode.org/versions/latest/. Many of the designations used by manufacturers

More information

WG2 N3275. ISO/IEC International Standard ISO/IEC Information technology Universal Multiple-Octet Coded Character Set (UCS)

WG2 N3275. ISO/IEC International Standard ISO/IEC Information technology Universal Multiple-Octet Coded Character Set (UCS) WG2 N3275 ISO/IEC International Standard ISO/IEC 10646 Final Committee Draft Information technology Universal Multiple-Octet Coded Character Set (UCS) echnologie de l information Jeu universel de caractères

More information

Introduction 1. Chapter 1

Introduction 1. Chapter 1 This PDF file is an excerpt from The Unicode Standard, Version 5.2, issued and published by the Unicode Consortium. The PDF files have not been modified to reflect the corrections found on the Updates

More information

TECkit version 2.0 A Text Encoding Conversion toolkit

TECkit version 2.0 A Text Encoding Conversion toolkit TECkit version 2.0 A Text Encoding Conversion toolkit Jonathan Kew SIL Non-Roman Script Initiative (NRSI) Abstract TECkit is a toolkit for encoding conversions. It offers a simple format for describing

More information

Proposed Update Unicode Technical Standard #10

Proposed Update Unicode Technical Standard #10 of 69 7/14/2010 12:04 PM Technical Reports Proposed Update Unicode Technical Standard #10 Version 6.0.0 draft 5 Authors Editors Mark Davis (markdavis@google.com), Ken Whistler (ken@unicode.org) Date 2010-07-09

More information

Title: Graphic representation of the Roadmap to the BMP, Plane 0 of the UCS

Title: Graphic representation of the Roadmap to the BMP, Plane 0 of the UCS ISO/IEC JTC1/SC2/WG2 N2316 Title: Graphic representation of the Roadmap to the BMP, Plane 0 of the UCS Source: Ad hoc group on Roadmap Status: Expert contribution Date: 2001-01-09 Action: For confirmation

More information

UNICODE COLLATION ALGORITHM

UNICODE COLLATION ALGORITHM 1 of 55 10/13/2007 1:56 PM Technical Reports Proposed Update Unicode Technical Standard #10 UNICODE COLLATION ALGORITHM Version 5.1.0 (draft 2) Authors Mark Davis (mark.davis@google.com), Ken Whistler

More information

NRSI: Computers & Writing Systems

NRSI: Computers & Writing Systems NRSI: Computers & Writing Systems SIL HOME CONTACT US Search You are here: Encoding > Unicode Search Home Contact us General Initiative B@bel WSI Guidelines Encoding Principles Unicode Tutorials PUA Character

More information

Proposed Update Unicode Standard Annex #9

Proposed Update Unicode Standard Annex #9 Technical Reports Proposed Update Unicode Standard Annex #9 Version Unicode 6.2.1 (draft 3) Editors Date 2012-10-26 This Version Previous Version Latest Version Latest Proposed Update Revision 28 Summary

More information

ISO/IEC JTC 1/SC 2/WG 2 N2895 L2/ Date:

ISO/IEC JTC 1/SC 2/WG 2 N2895 L2/ Date: ISO International Organization for Standardization Organisation Internationale de Normalisation ISO/IEC JTC 1/SC 2/WG 2 Universal Multiple-Octet Coded Character Set (UCS) ISO/IEC JTC 1/SC 2/WG 2 N2895

More information

This PDF file is an excerpt from The Unicode Standard, Version 4.0, issued by the Unicode Consortium and published by Addison-Wesley.

This PDF file is an excerpt from The Unicode Standard, Version 4.0, issued by the Unicode Consortium and published by Addison-Wesley. This PDF file is an excerpt from The Unicode Standard, Version 4.0, issued by the Unicode Consortium and published by Addison-Wesley. The material has been modified slightly for this online edition, however

More information

The Use of Unicode in MARC 21 Records. What is MARC?

The Use of Unicode in MARC 21 Records. What is MARC? # The Use of Unicode in MARC 21 Records Joan M. Aliprand Senior Analyst, RLG What is MARC? MAchine-Readable Cataloging MARC is an exchange format Focus on MARC 21 exchange format An implementation may

More information

Unicode: What is it and how do I use it?

Unicode: What is it and how do I use it? Abstract: The rationale for Unicode and its design goals and detailed design principles are presented. The correspondence between Unicode and ISO/IEC 10646 is discussed, the scripts included or planned

More information

The Unicode Standard Version 10.0 Core Specification

The Unicode Standard Version 10.0 Core Specification The Unicode Standard Version 10.0 Core Specification To learn about the latest version of the Unicode Standard, see http://www.unicode.org/versions/latest/. Many of the designations used by manufacturers

More information

Multilingual mathematical e-document processing

Multilingual mathematical e-document processing Multilingual mathematical e-document processing Azzeddine LAZREK University Cadi Ayyad, Faculty of Sciences Department of Computer Science Marrakech - Morocco lazrek@ucam.ac.ma http://www.ucam.ac.ma/fssm/rydarab

More information

Two distinct code points: DECIMAL SEPARATOR and FULL STOP

Two distinct code points: DECIMAL SEPARATOR and FULL STOP Two distinct code points: DECIMAL SEPARATOR and FULL STOP Dario Schiavon, 207-09-08 Introduction Unicode, being an extension of ASCII, inherited a great historical mistake, namely the use of the same code

More information

Proposed Update Unicode Standard Annex #9

Proposed Update Unicode Standard Annex #9 1 of 52 1/30/2015 11:23 AM Technical Reports Proposed Update Unicode Standard Annex #9 Version Unicode 8.0.0 (draft 4) Editors Date 2015-01-07 This Version Previous Version Latest Version Latest Proposed

More information

The Unicode Standard Version 6.0 Core Specification

The Unicode Standard Version 6.0 Core Specification The Unicode Standard Version 6.0 Core Specification To learn about the latest version of the Unicode Standard, see http://www.unicode.org/versions/latest/. Many of the designations used by manufacturers

More information

This PDF file is an excerpt from The Unicode Standard, Version 4.0, issued by the Unicode Consortium and published by Addison-Wesley.

This PDF file is an excerpt from The Unicode Standard, Version 4.0, issued by the Unicode Consortium and published by Addison-Wesley. This PDF file is an excerpt from The Unicode Standard, Version 4.0, issued by the Unicode Consortium and published by Addison-Wesley. The material has been modified slightly for this online edition, however

More information

UNICODE SECURITY CONSIDERATIONS

UNICODE SECURITY CONSIDERATIONS Page 1 of 66 Technical Reports Proposed Update Unicode Technical Report #36 UNICODE SECURITY CONSIDERATIONS Version 4 (draft 3) Authors Mark Davis (markdavis@google.com), Michel Suignard (michel@suignard.com)

More information

ISO/IEC JTC 1/SC 2/WG 2 PROPOSAL SUMMARY FORM TO ACCOMPANY SUBMISSIONS FOR ADDITIONS TO THE REPERTOIRE OF ISO/IEC / UNICODE

ISO/IEC JTC 1/SC 2/WG 2 PROPOSAL SUMMARY FORM TO ACCOMPANY SUBMISSIONS FOR ADDITIONS TO THE REPERTOIRE OF ISO/IEC / UNICODE ISO/IEC JTC 1/SC 2/WG 2 PROPOSAL SUMMARY FORM TO ACCOMPANY SUBMISSIONS FOR ADDITIONS TO THE REPERTOIRE OF ISO/IEC 10646 / UNICODE SC2/WG2 Nxxxx L2/02-xxx 2002-11-01 A. Administrative 1. Title: Proposal

More information

The Unicode Standard Version 10.0 Core Specification

The Unicode Standard Version 10.0 Core Specification The Unicode Standard Version 10.0 Core Specification To learn about the latest version of the Unicode Standard, see http://www.unicode.org/versions/latest/. Many of the designations used by manufacturers

More information

Proposed Update Unicode Technical Report #36

Proposed Update Unicode Technical Report #36 1 of 46 5/8/2010 4:56 PM Technical Reports Proposed Update Unicode Technical Report #36 Editors Mark Davis (markdavis@google.com), Michel Suignard (michel@suignard.com) Date 2010-04-28 (draft 5) This Version

More information

Proposed Update Unicode Standard Annex #9

Proposed Update Unicode Standard Annex #9 1 of 52 Technical Reports Proposed Update Unicode Standard Annex #9 Version Unicode 6.3.0 (draft 12) Editors Mark Davis (markdavis@google.com), Aharon Lanin (aharon@google.com), and Andrew Glass (andrew.glass@microsoft.com)

More information

Consent docket re WG2 Resolutions at its Meeting #35 as amended. For the complete text of Resolutions of WG2 Meeting #35, see L2/98-306R.

Consent docket re WG2 Resolutions at its Meeting #35 as amended. For the complete text of Resolutions of WG2 Meeting #35, see L2/98-306R. L2/98-389R Consent docket re WG2 Resolutions at its Meeting #35 as amended For the complete text of Resolutions of WG2 Meeting #35, see L2/98-306R. RESOLUTION M35.4 (PDAM-24 on Thaana): Unanimous to prepare

More information

UNICODE IDNA COMPATIBLE PREPROCESSSING

UNICODE IDNA COMPATIBLE PREPROCESSSING 1 of 12 1/23/2009 2:51 PM Technical Reports Proposed Draft Unicode Technical Standard #46 UNICODE IDNA COMPATIBLE PREPROCESSSING Version 1 (draft 1) Authors Mark Davis (markdavis@google.com), Michel Suignard

More information

ISO/IEC JTC 1/SC 2 N 3354

ISO/IEC JTC 1/SC 2 N 3354 ISO/IEC JTC 1/SC 2 N 3354 Date: 1999-09-02 ISO/IEC JTC 1/SC 2 CODED CHARACTER SETS SECRETARIAT: JAPAN (JISC) DOC TYPE: TITLE: SOURCE: National Body Contribution National Body Comments on SC 2 N 3331, ISO/IEC

More information

Internet Engineering Task Force (IETF) Request for Comments: Category: Standards Track ISSN: October 2017

Internet Engineering Task Force (IETF) Request for Comments: Category: Standards Track ISSN: October 2017 Internet Engineering Task Force (IETF) P. Saint-Andre Request for Comments: 8264 Jabber.org Obsoletes: 7564 M. Blanchet Category: Standards Track Viagenie ISSN: 2070-1721 October 2017 Abstract PRECIS Framework:

More information

UNICODE SCRIPT NAMES PROPERTY

UNICODE SCRIPT NAMES PROPERTY 1 of 10 1/29/2008 10:29 AM Technical Reports Proposed Update to Unicode Standard Annex #24 UNICODE SCRIPT NAMES PROPERTY Version Unicode 5.1.0 draft2 Authors Mark Davis (mark.davis@google.com), Ken Whistler

More information

****This proposal has not been submitted**** ***This document is displayed for initial feedback only*** ***This proposal is currently incomplete***

****This proposal has not been submitted**** ***This document is displayed for initial feedback only*** ***This proposal is currently incomplete*** 1 of 5 3/3/2003 1:25 PM ****This proposal has not been submitted**** ***This document is displayed for initial feedback only*** ***This proposal is currently incomplete*** ISO INTERNATIONAL ORGANIZATION

More information

UNICODE BIDIRECTIONAL ALGORITHM

UNICODE BIDIRECTIONAL ALGORITHM Technical Reports Proposed Update Unicode Standard Annex #9 UNICODE BIDIRECTIONAL ALGORITHM Version Unicode 11.0.0 (draft 1) Editors Mark Davis (markdavis@google.com), Aharon Lanin (aharon@google.com),

More information

Transliteration of Tamil and Other Indic Scripts. Ram Viswanadha Unicode Software Engineer IBM Globalization Center of Competency, California, USA

Transliteration of Tamil and Other Indic Scripts. Ram Viswanadha Unicode Software Engineer IBM Globalization Center of Competency, California, USA Transliteration of Tamil and Other Indic Scripts Ram Viswanadha Unicode Software Engineer IBM Globalization Center of Competency, California, USA Main points of Powerpoint presentation This talk gives

More information

The Unicode Standard Version 11.0 Core Specification

The Unicode Standard Version 11.0 Core Specification The Unicode Standard Version 11.0 Core Specification To learn about the latest version of the Unicode Standard, see http://www.unicode.org/versions/latest/. Many of the designations used by manufacturers

More information

uniseg-python Documentation

uniseg-python Documentation uniseg-python Documentation Release 0.7.1 Masaaki Shibata Apr 15, 2017 Contents 1 Modules 1 1.1 uniseg.codepoint Unicode code point............................. 1 1.2 uniseg.graphemecluster Grapheme cluster.........................

More information

Proposed Update Unicode Standard Annex #34

Proposed Update Unicode Standard Annex #34 Technical Reports Proposed Update Unicode Standard Annex #34 Version Unicode 6.3.0 (draft 1) Editors Addison Phillips Date 2013-03-29 This Version Previous Version Latest Version Latest Proposed Update

More information

CSS3 Text Extensions. 1 Summary. 2 Contents. Michel Suignard. Microsoft Corporation

CSS3 Text Extensions. 1 Summary. 2 Contents. Michel Suignard. Microsoft Corporation Michel Suignard Microsoft Corporation 1 Summary This document presents new text extensions considered for CSS3 (Cascading Style Sheet). The main topics presented are layout flow, text justification, baseline

More information

Network Working Group. Category: Informational July 1995

Network Working Group. Category: Informational July 1995 Network Working Group M. Ohta Request For Comments: 1815 Tokyo Institute of Technology Category: Informational July 1995 Status of this Memo Character Sets ISO-10646 and ISO-10646-J-1 This memo provides

More information

Working Draft International Standard st Edition. Information technology Universal Multiple-Octet Coded Character Set (UCS)

Working Draft International Standard st Edition. Information technology Universal Multiple-Octet Coded Character Set (UCS) ISO/IEC International Standard SC2/WG2 N2578 Working Draft International Standard 10646 1 st Edition ISO/IEC WD 10646 1 st Edition 2003-02-13 Information technology Universal Multiple-Octet Coded Character

More information

(URW) ++ UNICODE APERÇU 1. Nimbus Sans Block Name. Regular. Bold. Light Vers Regular. Regular. Bold. Medium. Vers Vers Vers. 4.

(URW) ++ UNICODE APERÇU 1. Nimbus Sans Block Name. Regular. Bold. Light Vers Regular. Regular. Bold. Medium. Vers Vers Vers. 4. UNICODE APERÇU 1 Unicode Code points (Plane, Plane 2) 93+9 HKSCS Alternates 8498 8498 31 425 1 Latin Extended-A 5 U+2FF U+52F U+4FF U+F U+5 U+5FF U+7 U+74F U+6FF U+77F U+7 U+7BF U+ U+97F U+7FF U+9FF U+A7F

More information

This document is to be used together with N2285 and N2281.

This document is to be used together with N2285 and N2281. ISO/IEC JTC1/SC2/WG2 N2291 2000-09-25 Universal Multiple-Octet Coded Character Set International Organization for Standardization Organisation internationale de normalisation еждународная организация по

More information

ISO/IEC JTC1/SC2/WG2 N

ISO/IEC JTC1/SC2/WG2 N ISO INERNAIONAL ORGANIZAION FOR SANDARDIZAION ORGANISAION INERNAIONALE DE NORMALISAION --------------------------------------------------------------------------------------- ISO/IEC JC1/SC2/WG2 Universal

More information

Rendering in Dzongkha

Rendering in Dzongkha Rendering in Dzongkha Pema Geyleg Department of Information Technology pema.geyleg@gmail.com Abstract The basic layout engine for Dzongkha script was created with the help of Mr. Karunakar. Here the layout

More information

Proposal to Encode Oriya Fraction Signs in ISO/IEC 10646

Proposal to Encode Oriya Fraction Signs in ISO/IEC 10646 Proposal to Encode Oriya Fraction Signs in ISO/IEC 0646 University of Michigan Ann Arbor, Michigan, U.S.A. pandey@umich.edu December 4, 2007 Contents Proposal Summary Form i Introduction 2 Characters Proposed

More information

UNICODE COLLATION ALGORITHM

UNICODE COLLATION ALGORITHM Technical Reports Proposed Update Unicode Technical Standard #10 UNICODE COLLATION ALGORITHM Version 12.0.0 Editors Mark Davis (markdavis@google.com), Ken Whistler (ken@unicode.org), Markus Scherer (markus.icu@gmail.com)

More information

The Unicode Standard Version 6.0 Core Specification

The Unicode Standard Version 6.0 Core Specification The Unicode Standard Version 6.0 Core Specification To learn about the latest version of the Unicode Standard, see http://www.unicode.org/versions/latest/. Many of the designations used by manufacturers

More information

Form number: N2352-F (Original ; Revised , , , , , , ) N2352-F Page 1 of 7

Form number: N2352-F (Original ; Revised , , , , , , ) N2352-F Page 1 of 7 ISO/IEC JTC 1/SC 2/WG 2 PROPOSAL SUMMARY FORM TO ACCOMPANY SUBMISSIONS FOR ADDITIONS TO THE REPERTOIRE OF ISO/IEC 10646 1 Please fill all the sections A, B and C below. (Please read Principles and Procedures

More information

ATypI Hongkong Development of a Pan-CJK Font

ATypI Hongkong Development of a Pan-CJK Font ATypI Hongkong 2012 Development of a Pan-CJK Font What is a Pan-CJK Font? Pan (greek: ) means "all" or "involving all members" of a group Pan-CJK means a Unicode based font which supports different countries

More information

ISO/IEC JTC 1/SC 2/WG 2 N3086 PROPOSAL SUMMARY FORM TO ACCOMPANY SUBMISSIONS 1

ISO/IEC JTC 1/SC 2/WG 2 N3086 PROPOSAL SUMMARY FORM TO ACCOMPANY SUBMISSIONS 1 TP PT Form for PT ISO/IEC JTC 1/SC 2/WG 2 N3086 PROPOSAL SUMMARY FORM TO ACCOMPANY SUBMISSIONS 1 FOR ADDITIONS TO THE REPERTOIRE OF ISO/IEC 10646TP Please fill all the sections A, B and C below. Please

More information

Proposal on Handling Reph in Gurmukhi and Telugu Scripts

Proposal on Handling Reph in Gurmukhi and Telugu Scripts Proposal on Handling Reph in Gurmukhi and Telugu Scripts Nagarjuna Venna August 1, 2006 1 Introduction Chapter 9 of the Unicode standard [1] describes the representational model for encoding Indic scripts.

More information