The Unicode Standard. Version 3.0. The Unicode Consortium ADDISON-WESLEY. An Imprint of Addison Wesley Longman, Inc.

Size: px
Start display at page:

Download "The Unicode Standard. Version 3.0. The Unicode Consortium ADDISON-WESLEY. An Imprint of Addison Wesley Longman, Inc."

Transcription

1 The Unicode Standard Version 3.0 The Unicode Consortium ADDISON-WESLEY An Imprint of Addison Wesley Longman, Inc. Reading, Massachusetts Harlow, England Menlo Park, California Berkeley, California Don Mills, Ontario Sydney Bonn Amsterdam Tokyo Mexico City

2 Acknowledgments iii Unicode Consortium Menibers and Directors viii Füll Members viii Current Associate Members viii Current Liaison Menibers Current Specialist Members Current Individual Members Current Members of the Board of Directors Former Members of the Board of Directors Contents xi Figures x Tables xxi Preface xxv 0.1 About the Unicode Standard xxv Concepts, Architecture, Conformance, and Guidelines xxv Character Block Descriptions xxvi Charts and Index xxvi Appendices and Tables xxvii The Unicode Character Database and Technical Reports xxvii On the CD-ROM xxvii 0.2 Notational Conventions xxviii Extended BNF xxviii Operators xx 0.3 Resources xxx Unicode Web Site xxx Unicode Anonymous FTP Site xxx Unicode Public Mailing List xxx How to Contact the Unicode Consortium xxx Introduction Coverage 2 Standards Coverage 3 New Characters Design Basis Text Handling 4 Interpreting Characters 5 Text Elements The Unicode Standard and ISO/IEC The Unicode Consortium 6 The Unicode Technical Committee 6 General Structure Architectural Context 9 Basic Text Processes 9 Text Elements, Code Values, and Text Processes 10 xi

3 Text Processes and Encoding 1 ] 2.2 Unicode Design Principles 12 Steen-Bit Character Codes 12 Efficiency 13 Characters, Not Glyphs 13 Semantics 15 Piain Text 15 Logical Order 16 Unification 17 Dynamic Composition 18 Equivalent Sequence 18 Convertibility Encoding Forms 19 UTF UTF-8 20 Character Encoding Schemes Unicode Allocation 21 Allocation Areas 21 Codespace Assignment for Graphic Characters 23 Nongraphic Characters, Reserved and Unassigned Codes Writing Direction Combining Characters 24 Sequence of Base Characters and Diacritics 25 Multiple Combining Characters 25 Multiple Base Characters 27 Spacing Clones of European Diacritical Marks Special Character and Noncharacter Values 28 Byte Order Mark (BOM) 28 Special Noncharacter Values 28 Separators 29 Layout and Format Control Characters 29 The Replacement Character Controls and Control Sequences 29 Control Characters 29 Representing Control Sequences Conforming to the Unicode Standard 30 Characters Not Used in a Subset Referencing Versions of the Unicode Standard 32 3 Conformance Conformance Requirements 37 Byte Ordering 37 Invalid Code Values 3g Interpretation 38 Modification 39 Transformations 39 Bidirectional Text 39 Unicode Technical Reports Semantics Characters and Coded Representations Simple Properties Combination 43 " TheUn kode Standard 3.0

4 3.6 Decomposition 44 CompatibiHty Decomposition 44 Canonical Decomposition Surrogates Transformations Special Character Properties Canonical Ordering Behavior 50 Combining Classes 51 Canonical Ordering 51 Use with Collation Conjoining Jarno Behavior 52 Syllable Boundaries 53 Standard Syllables 53 Hangul Syllable Composition 54 Hangul Syllable Decomposition 55 Hangul Syllable Names Bidirectional Behavior 55 Directional Formatting Codes 56 Basic Display Algorithm 57 Definitions 58 Resolving Embedding Levels 61 Reordering Resolved Levels 65 Bidirectional Conformance 67 Implementation Notes 68 4 Character Properties Case Normative Combining Classes Normative Directionality Normative Jamo Short Names Normative General Category Normative in Part Numeric Value Normative Mirrored Normative Unicode 1.0 Names Mathematical Property Letters and Other Useful Properties Implementation Guidelines Transcoding to Other Standards 105 Issues 105 Multistage Tables Bit or 8-Bit Transmission 107 Mapping Table Resources ANSI/ISO C wchar_t Unknown and Missing Characters 108 Unassigned and Private Use Character Codes 108 Interpretable but Unrenderable Characters 108 Reassigned Characters Handimg Surrogate Pairs Handling Numbers HO 5.6 Handling Properties 111 Xln

5 xiv 5.7 Normalization 5.8 Compression Line Handling Regulär Expressions H Language Information in Piain Text! U Requirements for Language Tagging Working with Language Tags..'. U4 Language Tags and Han Unification Editing and Selection! 15 Consistent Text Elements U Strategies for Handling Nonspacing Marks! J? Keyboard Input Truncation J Rendering Nonspacing Marks U9 Positioning Methods Locating Text Element Boundaries \f A Boundary Specification Example Specifications.'.'.'.'.'.'.'.' 124 Grapheme Boundaries 126 Word Boundaries 126 Line Boundaries 127 Sentence Boundaries '' " _' 129 Random Access ^ Identifiers Syntactic Rule SortingandSearching 134 Culturally Expected Sorüng.'' 135 Unicode Character Equivalence..." 135 Similar Characters 136 Levels of Comparison.'."'.' 136 Ignorable Characters ' 137 Multiple Mappings.'..'.'.'. 13S Collating Out-of-Scope Characters..'.' 138 Unmapped Characters 139 Parameterization 139 Optimizations 140 Searching! Sublinear Searching..." Case Mappings I41 Punctuation 6.1 General Punctuation 147 Punctuation: U+0020-U+OOBF.'.'.' 148 General Punctuation: U+200O-U+206F.'.'.' 148 CJK Symbols and Punctuation: U+3000-U+303F 149 CJK Compatibility Forms: U+FE30-U+FE4F. 155 I56 Small Form Variants: U+FE50-U+FE6F 156 European Alphabetic Scripts Latin "_" Letters of Basic Latin: U+0041-U+007A Letters of the Latin-1 Supplement: U+OOCO-uVoOFF J5? Latin Extended-A: U+01CKMJ+017F

6 Latin Extended-B: U+0180-U+024F 163 IPA Extensions: U+0250-U+02AF 164 Latin Extended Additional: U+1E00-U+1EFF 165 Latin Ligatures: FB00-FBO Greek 167 Greek: U+037O-U+03FF 167 Greek Extended: U+1F0O-U+1FFF Cyrillic 171 Cyrillic: U+0400-U+04FF Armenian 172 Armenian: U+0530-U+058F Georgian 173 Georgian: U+10A(MJ+10FF Runic 174 Runic: U+16A0-U+16FO Ogham 176 Ogham: U+1680-U+169F Modifier Letters 177 Spacing Modifier Letters: U+02BO-U+02FF Combining Marks 179 Combining Diacritical Marks: U+0300-U+036F 179 Combining Marks for Symbols: U+20D0-U+20FF 180 Combining Half Marks: U+FE20-U+FE2F Middle Eastern Scripts Hebrew 186 Hebrew: U+0590-U+05FF 186 Alphabetic Presentation Forms: U+FB1D-U+FB4F Arabic 189 Arabic: U+0600-U+06FF 189 Cursive Joining 192 Ligatures 194 Arabic Presentation Forms-A: U+FB50-U+FDFF 197 Arabic Presentation Forms-B: U+FE70-U+FEFF Syriac 199 Syriac: U+0700-U+074F 199 Syriac Shaping 203 Syriac Cursive Joining 203 Ligatures Thaana 206 Thaana: U+0780-U+07BF South and Southeast Asian Scripts Devanagari 211 Devanagari: U+0900-U+097F Bengali 224 Bengali: U+0980-U+09FF Gurmukhi 225 Gurmukhi: U+OA0O-U+0A7F Gujarati 226 Gujarati: U+0A80-U+0AFF Oriya 227 Oriya: U+0B00-U+0B7F 227 xv

7 9.6 Tamil 228 Tamil: U+0B80-U+0BFF ' Telugu 233 Telugu: U+0COO-U+0C7F Kannada 234 Kannada: U+0C80-U+0CFF Malayalam 235 Malayalam: U+ODO0-U+OD7F Sinhala 236 Sinhala: U+0D80-U+0DFF Thai '.'.'.'.'.'''.'.'.'.'.'.'.'''.'.'.217 Thai: U+OEOO-U+0E7F Lao 239 Lao: U+0E80-U+0EFF Tibetan 240 Tibetan: U+0F00-U+0FBF!!'.' Myanmar 249 Myanmar: U+1000-U+109F Khmer "'' \ \\\' " '' \\' " 251 Khmer: U+1780-U+17FF East Asian Scripts Han 258 CJK Unified Ideographs 258 CJK Compatibility Ideographs: U+F900-U+FAFF 267 Kanbun: U+3190-U+319F " 267 CJK and KangXi Radicals: U+2E8Ö-U+2FD5 267 Ideographie Description: (J+2FF0-U+2FFB Hiragana 272 Hiragana: U+3040-U+309F Katakana 273 Katakana: U+30A0-U+30FF 273 Halfwidth and Fullwidth Forms: U+FFOO-U+FFEF Hangul 275 HangulJamo:U+1100-U+llFF 275 Hangul Compatibility Jarno: U+3130-U+318F 275 Hangul Syllables: U+AC00-U+D7A Bopomofo 27g Bopomofo: U+3100-U+312F io -6 Yi.'";;;;::.".".".":.'.":::.':".';28o Yi: U+AO0O-U+A4CF Additional Scripts Ethiopic 284 Ethiopic: U+1200-U+137F Cherokee 287 Cherokee: U+13A0-U+13FF ^ Canadian Aboriginal Syllabics 288 Canadian Aboriginal Syllabics: U+1400-U+167F Mongolian 289 Mongolian: U+180O-U+18AF Symbols 295 xvi

8 12.1 Currency Symbols 297 Currency Symbols: U+20A0-U+20CF Letterlike Symbols 298 Letterlike Symbols: U+2100-U+214F Number Forms 299 Number Forms: U+2150-U+218F 299 Superscripts and Subscripts: U+2070-U+209F Mathematical Operators 300 Mathematical Operators: U+2200-U+22FF 300 Arrows: U+2190-U+21FF Technical Symbols 302 Control Pictures: U+2400-U+243F 302 Miscellaneous Technical: U+2300-U+23FF 302 Optical Character Recognition: U+2440-U+245F Geometrical Symbols 304 Box Drawing: U+2500-U+257F 304 Block Elements: U+2580-U+259F 304 Geometrie Shapes: U+25A0^U+25FF Miscellaneous Symbols and Dingbats 305 Miscellaneous Symbols: U+2600-U+26FF 305 Dingbats: U+2700-U+27BF Enclosed and Square 307 Enclosed Alphanumerics: U+2460-U+24FF 307 Enclosed CJK Letters and Months: U+3200-U+32FF 307 CJK Compatibility: U+3300-U+33FF Braille 308 Braille: U+2800-U+28FF Special Areas and Format Characters Control Codes 314 CO Control Codes: U+0000-U+001F 314 Cl Control Codes: U+0080-U+009F Layout Controls 315 Layout Controls Deprecated Format Characters 320 Deprecated Format Characters: U+206A-U+206F Surrogates Area 322 Surrogates Area: U+D800-U+DFFF Private Use Area 323 Private Use Area: U+EO0O^U+F8FF Specials 324 Specials: U+FEFF, U+FFF0-U+FFFF, Code Charts Character Names List 331 Images in the Code Charts and Character Lists 332 Cross References 333 Case Form Mappings 333 Decompositions 333 Information About Languages 334 Reserved Characters CJK Unified Ideographs 335 xvii

9 14.3 Hangul Syllables Han Indices Han Radical-Stroke Index Shift-JIS Index 923 A Han Unification History 96i B Submitting New Characters 963 B.l Proposal Guidelines 963 B.2 Requirements of Proposal Form and Process 964 Interim Solutions 965 Sending Proposals, 965 C Relationship to ISO/IEC C.1 History 967 Unicode Unicode Unicode C.2 Encoding Forms in ISO/IEC Zero Extending 970 C.3 UCS Transformation Formats 970 UTF UTF C.4 Synchronization of the Standards 971 C.5 Identification of Features for the Unicode Standard 971 C.6 Character Names 972 C.7 Character Functional Specifications 972 D Changes from Unicode Version D.l Versions of the Unicode Standard 973 D.2 Changes from Unicode Version 2.0 to Version New Characters Added 974 Character Semantics Changes 974 Changes Affecting Conformance 974 D.3 Changes from Unicode Version 2.1 to Version New Characters Added 975 Character Semantics Changes 978 Changes Affecting Conformance 979 Unicode Technical Reports 980 G Glossary 983 R References 999 R.l Source Standards 999 R.2 Source Dictionaries for Han Unification 1002 R.3 Other Sources for the Unicode Standard 1003 R.4 Selected Resources 1003 I Indices 1011 LI Unicode Names Index 1011 L2 General Index 1037 xvm

(URW) ++ UNICODE APERÇU 1. Nimbus Sans Block Name. Regular. Bold. Light Vers Regular. Regular. Bold. Medium. Vers Vers Vers. 4.

(URW) ++ UNICODE APERÇU 1. Nimbus Sans Block Name. Regular. Bold. Light Vers Regular. Regular. Bold. Medium. Vers Vers Vers. 4. UNICODE APERÇU 1 Unicode Code points (Plane, Plane 2) 93+9 HKSCS Alternates 8498 8498 31 425 1 Latin Extended-A 5 U+2FF U+52F U+4FF U+F U+5 U+5FF U+7 U+74F U+6FF U+77F U+7 U+7BF U+ U+97F U+7FF U+9FF U+A7F

More information

Title: Graphic representation of the Roadmap to the BMP of the UCS

Title: Graphic representation of the Roadmap to the BMP of the UCS ISO/IEC JTC1/SC2/WG2 N2045 Title: Graphic representation of the Roadmap to the BMP of the UCS Source: Ad hoc group on Roadmap Status: Expert contribution Date: 1999-08-15 Action: For confirmation by ISO/IEC

More information

Title: Graphic representation of the Roadmap to the BMP, Plane 0 of the UCS

Title: Graphic representation of the Roadmap to the BMP, Plane 0 of the UCS ISO/IEC JTC1/SC2/WG2 N2316 Title: Graphic representation of the Roadmap to the BMP, Plane 0 of the UCS Source: Ad hoc group on Roadmap Status: Expert contribution Date: 2001-01-09 Action: For confirmation

More information

JAVA.LANG.CHARACTER.UNICODEBLOCK CLASS

JAVA.LANG.CHARACTER.UNICODEBLOCK CLASS JAVA.LANG.CHARACTER.UNICODEBLOCK CLASS http://www.tutorialspoint.com/java/lang/java_lang_character.unicodehtm Copyright tutorialspoint.com Introduction The java.lang.character.unicodeblock class is a family

More information

Thu Jun :48:11 Canada/Eastern

Thu Jun :48:11 Canada/Eastern Roadmaps to Unicode Thu Jun 24 2004 17:48:11 Canada/Eastern Home Site Map Search Tables Roadmap Introduction Roadmap to the BMP (Plane 0) Roadmap to the SMP (Plane 1) Roadmap to the SIP (Plane 2) Roadmap

More information

ISO/IEC JTC 1/SC 2 N 3426

ISO/IEC JTC 1/SC 2 N 3426 ISO/IEC JTC 1/SC 2 N 3426 Date: 2000-04-04 Supersedes SC 2 N 2830 ISO/IEC JTC 1/SC 2 CODED CHARACTER SETS SECRETARIAT: JAPAN (JISC) DOC TYPE: TITLE: Other document Graphic representation of the Roadmap

More information

The Unicode Standard Version 10.0 Core Specification

The Unicode Standard Version 10.0 Core Specification The Unicode Standard Version 10.0 Core Specification To learn about the latest version of the Unicode Standard, see http://www.unicode.org/versions/latest/. Many of the designations used by manufacturers

More information

Unicode and Standardized Notation. Anthony Aristar

Unicode and Standardized Notation. Anthony Aristar Data Management and Archiving University of California at Santa Barbara, June 24-27, 2008 Unicode and Standardized Notation Anthony Aristar Once upon a time There were people who decided to invent computers.

More information

The Unicode Standard Version 12.0 Core Specification

The Unicode Standard Version 12.0 Core Specification The Unicode Standard Version 12.0 Core Specification To learn about the latest version of the Unicode Standard, see http://www.unicode.org/versions/latest/. Many of the designations used by manufacturers

More information

Unicode definition list

Unicode definition list abstract character D3 3.3 2 abstract character sequence D4 3.3 2 accent mark alphabet alphabetic property 4.10 2 alphabetic sorting annotation ANSI Arabic digit 1 Arabic-Indic digit 3.12 1 ASCII assigned

More information

To the BMP and beyond!

To the BMP and beyond! To the BMP and beyond! Eric Muller Adobe Systems Adobe Systems - To the BMP and beyond! July 20, 2006 - Slide 1 Content 1. Why Unicode 2. Character model 3. Principles of the Abstract Character Set 4.

More information

Domain Names in Pakistani Languages. IDNs for Pakistani Languages

Domain Names in Pakistani Languages. IDNs for Pakistani Languages ا ہ 6 5 a ز @ ں ب Domain Names in Pakistani Languages س a ی س a ب او اور را < ہ ر @ س a آف ا ر ا 6 ب 1 Domain name Domain name is the address of the web page pg on which the content is located 2 Internationalized

More information

This PDF file is an excerpt from The Unicode Standard, Version 4.0, issued by the Unicode Consortium and published by Addison-Wesley.

This PDF file is an excerpt from The Unicode Standard, Version 4.0, issued by the Unicode Consortium and published by Addison-Wesley. This PDF file is an excerpt from The Unicode Standard, Version 4.0, issued by the Unicode Consortium and published by Addison-Wesley. The material has been modified slightly for this online edition, however

More information

This PDF file is an excerpt from The Unicode Standard, Version 4.0, issued by the Unicode Consortium and published by Addison-Wesley.

This PDF file is an excerpt from The Unicode Standard, Version 4.0, issued by the Unicode Consortium and published by Addison-Wesley. This PDF file is an excerpt from The Unicode Standard, Version 4.0, issued by the Unicode Consortium and published by Addison-Wesley. The material has been modified slightly for this online edition, however

More information

NRSI: Computers & Writing Systems

NRSI: Computers & Writing Systems NRSI: Computers & Writing Systems SIL HOME CONTACT US Search You are here: Encoding > Unicode Search Home Contact us General Initiative B@bel WSI Guidelines Encoding Principles Unicode Tutorials PUA Character

More information

The Unicode Standard Version 6.1 Core Specification

The Unicode Standard Version 6.1 Core Specification The Unicode Standard Version 6.1 Core Specification To learn about the latest version of the Unicode Standard, see http://www.unicode.org/versions/latest/. Many of the designations used by manufacturers

More information

Language Processing with Perl and Prolog

Language Processing with Perl and Prolog Language Processing with Perl and Prolog Pierre Nugues Lund University Pierre.Nugues@cs.lth.se http://cs.lth.se/pierre_nugues/ Pierre Nugues Language Processing with Perl and Prolog 1 / 29 Character Sets

More information

The Unicode Standard Version 11.0 Core Specification

The Unicode Standard Version 11.0 Core Specification The Unicode Standard Version 11.0 Core Specification To learn about the latest version of the Unicode Standard, see http://www.unicode.org/versions/latest/. Many of the designations used by manufacturers

More information

The Unicode Standard Version 10.0 Core Specification

The Unicode Standard Version 10.0 Core Specification The Unicode Standard Version 10.0 Core Specification To learn about the latest version of the Unicode Standard, see http://www.unicode.org/versions/latest/. Many of the designations used by manufacturers

More information

Multimedia Data. Multimedia Data. Text Vector Graphics 3-D Vector Graphics. Raster Graphics Digital Image Voxel. Audio Digital Video

Multimedia Data. Multimedia Data. Text Vector Graphics 3-D Vector Graphics. Raster Graphics Digital Image Voxel. Audio Digital Video Multimedia Data Multimedia Data Text Vector Graphics 3-D Vector Graphics Raster Graphics Digital Image Voxel Audio Digital Video 1 Text There are three types of text that are used to produce pages of documents

More information

This PDF file is an excerpt from The Unicode Standard, Version 4.0, issued by the Unicode Consortium and published by Addison-Wesley.

This PDF file is an excerpt from The Unicode Standard, Version 4.0, issued by the Unicode Consortium and published by Addison-Wesley. This PDF file is an excerpt from The Unicode Standard, Version 4.0, issued by the Unicode Consortium and published by Addison-Wesley. The material has been modified slightly for this online edition, however

More information

The Unicode Standard Version 11.0 Core Specification

The Unicode Standard Version 11.0 Core Specification The Unicode Standard Version 11.0 Core Specification To learn about the latest version of the Unicode Standard, see http://www.unicode.org/versions/latest/. Many of the designations used by manufacturers

More information

Information, Characters, Unicode

Information, Characters, Unicode Information, Characters, Unicode Information Characters In modern computing, natural-language text is very important information. ( Number-crunching is less important.) Characters of text are represented

More information

2011 Martin v. Löwis. Data-centric XML. Character Sets

2011 Martin v. Löwis. Data-centric XML. Character Sets Data-centric XML Character Sets Character Sets: Rationale Computer stores data in sequences of bytes each byte represents a value in range 0..255 Text data are intended to denote characters, not numbers

More information

This PDF file is an excerpt from The Unicode Standard, Version 4.0, issued by the Unicode Consortium and published by Addison-Wesley.

This PDF file is an excerpt from The Unicode Standard, Version 4.0, issued by the Unicode Consortium and published by Addison-Wesley. This PDF file is an excerpt from The Unicode Standard, Version 4.0, issued by the Unicode Consortium and published by Addison-Wesley. The material has been modified slightly for this online edition, however

More information

2007 Martin v. Löwis. Data-centric XML. Character Sets

2007 Martin v. Löwis. Data-centric XML. Character Sets Data-centric XML Character Sets Character Sets: Rationale Computer stores data in sequences of bytes each byte represents a value in range 0..255 Text data are intended to denote characters, not numbers

More information

The Unicode Standard Version 11.0 Core Specification

The Unicode Standard Version 11.0 Core Specification The Unicode Standard Version 11.0 Core Specification To learn about the latest version of the Unicode Standard, see http://www.unicode.org/versions/latest/. Many of the designations used by manufacturers

More information

The Unicode Standard Version 6.0 Core Specification

The Unicode Standard Version 6.0 Core Specification The Unicode Standard Version 6.0 Core Specification To learn about the latest version of the Unicode Standard, see http://www.unicode.org/versions/latest/. Many of the designations used by manufacturers

More information

General Structure 2. Chapter Architectural Context

General Structure 2. Chapter Architectural Context This PDF file is an excerpt from The Unicode Standard, Version 5.2, issued and published by the Unicode Consortium. The PDF files have not been modified to reflect the corrections found on the Updates

More information

This document is to be used together with N2285 and N2281.

This document is to be used together with N2285 and N2281. ISO/IEC JTC1/SC2/WG2 N2291 2000-09-25 Universal Multiple-Octet Coded Character Set International Organization for Standardization Organisation internationale de normalisation еждународная организация по

More information

Unicode: What is it and how do I use it?

Unicode: What is it and how do I use it? Abstract: The rationale for Unicode and its design goals and detailed design principles are presented. The correspondence between Unicode and ISO/IEC 10646 is discussed, the scripts included or planned

More information

The Unicode Standard Version 11.0 Core Specification

The Unicode Standard Version 11.0 Core Specification The Unicode Standard Version 11.0 Core Specification To learn about the latest version of the Unicode Standard, see http://www.unicode.org/versions/latest/. Many of the designations used by manufacturers

More information

The Unicode Standard Version 11.0 Core Specification

The Unicode Standard Version 11.0 Core Specification The Unicode Standard Version 11.0 Core Specification To learn about the latest version of the Unicode Standard, see http://www.unicode.org/versions/latest/. Many of the designations used by manufacturers

More information

COSC 243 (Computer Architecture)

COSC 243 (Computer Architecture) COSC 243 Computer Architecture And Operating Systems 1 Dr. Andrew Trotman Instructors Office: 123A, Owheo Phone: 479-7842 Email: andrew@cs.otago.ac.nz Dr. Zhiyi Huang (course coordinator) Office: 126,

More information

The Unicode Standard Version 6.2 Core Specification

The Unicode Standard Version 6.2 Core Specification The Unicode Standard Version 6.2 Core Specification To learn about the latest version of the Unicode Standard, see http://www.unicode.org/versions/latest/. Many of the designations used by manufacturers

More information

Glossary. The Unicode Standard

Glossary. The Unicode Standard G Abstract Character. A unit of information used for the organization, control, or representation of textual data. (See Definition D3 in Section 3.3, Characters and Coded Representations.) Accent Mark.

More information

General Structure 2. Chapter Architectural Context

General Structure 2. Chapter Architectural Context Chapter 2 General Structure 2 This chapter discusses the fundamental principles governing the design of the Unicode Standard and presents an informal overview of its main features. The chapter starts by

More information

The Unicode Standard Version 9.0 Core Specification

The Unicode Standard Version 9.0 Core Specification The Unicode Standard Version 9.0 Core Specification To learn about the latest version of the Unicode Standard, see http://www.unicode.org/versions/latest/. Many of the designations used by manufacturers

More information

Blending Content for South Asian Language Pedagogy Part 2: South Asian Languages on the Internet

Blending Content for South Asian Language Pedagogy Part 2: South Asian Languages on the Internet Blending Content for South Asian Language Pedagogy Part 2: South Asian Languages on the Internet A. Sean Pue South Asia Language Resource Center Pre-SASLI Workshop 6/7/09 1 Objectives To understand how

More information

This PDF file is an excerpt from The Unicode Standard, Version 4.0, issued by the Unicode Consortium and published by Addison-Wesley.

This PDF file is an excerpt from The Unicode Standard, Version 4.0, issued by the Unicode Consortium and published by Addison-Wesley. This PDF file is an excerpt from The Unicode Standard, Version 4.0, issued by the Unicode Consortium and published by Addison-Wesley. The material has been modified slightly for this online edition, however

More information

The Unicode Standard Version 6.0 Core Specification

The Unicode Standard Version 6.0 Core Specification The Unicode Standard Version 6.0 Core Specification To learn about the latest version of the Unicode Standard, see http://www.unicode.org/versions/latest/. Many of the designations used by manufacturers

More information

108_GILLAM.index.fm Page 817 Monday, August 19, :35 PM. Index

108_GILLAM.index.fm Page 817 Monday, August 19, :35 PM. Index 108_GILLAM.index.fm Page 817 Monday, August 19, 2002 3:35 PM Index A AAT (Apple Advanced Typography), 675 baseline adjustment, 681 caret positioning, 681 682 glyphs compound, 680 selection/placement, 678

More information

The Unicode Standard Version 10.0 Core Specification

The Unicode Standard Version 10.0 Core Specification The Unicode Standard Version 10.0 Core Specification To learn about the latest version of the Unicode Standard, see http://www.unicode.org/versions/latest/. Many of the designations used by manufacturers

More information

This PDF file is an excerpt from The Unicode Standard, Version 4.0, issued by the Unicode Consortium and published by Addison-Wesley.

This PDF file is an excerpt from The Unicode Standard, Version 4.0, issued by the Unicode Consortium and published by Addison-Wesley. This PDF file is an excerpt from The Unicode Standard, Version 4.0, issued by the Unicode Consortium and published by Addison-Wesley. The material has been modified slightly for this online edition, however

More information

The Unicode Standard Version 6.1 Core Specification

The Unicode Standard Version 6.1 Core Specification The Unicode Standard Version 6.1 Core Specification To learn about the latest version of the Unicode Standard, see http://www.unicode.org/versions/latest/. Many of the designations used by manufacturers

More information

Consent docket re WG2 Resolutions at its Meeting #35 as amended. For the complete text of Resolutions of WG2 Meeting #35, see L2/98-306R.

Consent docket re WG2 Resolutions at its Meeting #35 as amended. For the complete text of Resolutions of WG2 Meeting #35, see L2/98-306R. L2/98-389R Consent docket re WG2 Resolutions at its Meeting #35 as amended For the complete text of Resolutions of WG2 Meeting #35, see L2/98-306R. RESOLUTION M35.4 (PDAM-24 on Thaana): Unanimous to prepare

More information

EDAN20 Language Technology Chapter 3: Encoding and Annotation Schemes

EDAN20 Language Technology   Chapter 3: Encoding and Annotation Schemes EDAN20 http://cs.lth.se/edan20/ Pierre Nugues Lund University Pierre.Nugues@cs.lth.se http://cs.lth.se/pierre_nugues/ August 31, 2017 Pierre Nugues EDAN20 http://cs.lth.se/edan20/ August 31, 2017 1/34

More information

Proposed Update. Unicode Standard Annex #11

Proposed Update. Unicode Standard Annex #11 1 of 12 5/8/2010 9:14 AM Technical Reports Proposed Update Unicode Standard Annex #11 Version Unicode 6.0.0 draft 2 Authors Asmus Freytag (asmus@unicode.org) Date 2010-03-04 This Version Previous http://www.unicode.org/reports/tr11/tr11-19.html

More information

ISO/IEC INTERNATIONAL STANDARD

ISO/IEC INTERNATIONAL STANDARD INTERNATIONAL STANDARD Provläsningsexemplar / Preview ISO/IEC 10646 First edition 2003-12-15 AMENDMENT 3 2008-02-15 Information technology Universal Multiple-Octet Coded Character Set (UCS) AMENDMENT 3:

More information

Basis Technology Unicode 対応ライブラリスペックシート

Basis Technology Unicode 対応ライブラリスペックシート Adobe-Standard-Encoding Adobe-Symbol-Encoding cshppsmath Adobe-Zapf-Dingbats-Encoding cszapfdingbats Arabic ISO-8859-6, csisolatinarabic, iso-ir-127, ECMA-114, ASMO-708 ASCII US-ASCII, ANSI_X3.4-1968,

More information

Character Properties 4

Character Properties 4 Chapter 4 Character Properties 4 Disclaimer The content of all character property tables has been verified as far as possible by the Unicode Consortium. However, the Unicode Consortium does not guarantee

More information

This PDF file is an excerpt from The Unicode Standard, Version 4.0, issued by the Unicode Consortium and published by Addison-Wesley.

This PDF file is an excerpt from The Unicode Standard, Version 4.0, issued by the Unicode Consortium and published by Addison-Wesley. This PDF file is an excerpt from The Unicode Standard, Version 4.0, issued by the Unicode Consortium and published by Addison-Wesley. The material has been modified slightly for this online edition, however

More information

The Unicode Standard Version 7.0 Core Specification

The Unicode Standard Version 7.0 Core Specification The Unicode Standard Version 7.0 Core Specification To learn about the latest version of the Unicode Standard, see http://www.unicode.org/versions/latest/. Many of the designations used by manufacturers

More information

Proposed Update Unicode Standard Annex #11 EAST ASIAN WIDTH

Proposed Update Unicode Standard Annex #11 EAST ASIAN WIDTH Page 1 of 10 Technical Reports Proposed Update Unicode Standard Annex #11 EAST ASIAN WIDTH Version Authors Summary This annex presents the specifications of an informative property for Unicode characters

More information

ISO INTERNATIONAL STANDARD. Information and documentation Transliteration of Devanagari and related Indic scripts into Latin characters

ISO INTERNATIONAL STANDARD. Information and documentation Transliteration of Devanagari and related Indic scripts into Latin characters INTERNATIONAL STANDARD ISO 15919 First edition 2001-10-01 Information and documentation Transliteration of Devanagari and related Indic scripts into Latin characters Information et documentation Translittération

More information

UNICODE SCRIPT NAMES PROPERTY

UNICODE SCRIPT NAMES PROPERTY 1 of 10 1/29/2008 10:29 AM Technical Reports Proposed Update to Unicode Standard Annex #24 UNICODE SCRIPT NAMES PROPERTY Version Unicode 5.1.0 draft2 Authors Mark Davis (mark.davis@google.com), Ken Whistler

More information

Conformance 3. Chapter Versions of the Unicode Standard

Conformance 3. Chapter Versions of the Unicode Standard This PDF file is an excerpt from The Unicode Standard, Version 5.2, issued and published by the Unicode Consortium. The PDF files have not been modified to reflect the corrections found on the Updates

More information

ISO/IEC TR Information technology. An operational model for characters and glyphs. Version: 20 July, 1998

ISO/IEC TR Information technology. An operational model for characters and glyphs. Version: 20 July, 1998 ISO/IEC TR 15285 Information technology An operational model for characters and glyphs Technologies de l information Modèle pour l utilisation de caractères graphiques et de glyphes Version: 20 July, 1998

More information

Code Charts 17. Chapter Character Names List. Disclaimer

Code Charts 17. Chapter Character Names List. Disclaimer This PDF file is an excerpt from The Unicode Standard, Version 5.2, issued and published by the Unicode Consortium. The PDF files have not been modified to reflect the corrections found on the Updates

More information

3494 Date: Supersedes SC 2 N 3426

3494 Date: Supersedes SC 2 N 3426 ISO/IEC JTC 1/SC 2 N 3494 3494 Date: 2000-10-06 Supersedes SC 2 N 3426 ISO/IEC JTC 1/SC 2 CODED CHARACTER SETS SECRETARIAT: JAPAN (JISC) DOC TYPE: Other document TITLE: ISO/IEC 10646 Roadmap [WG 2 N2313,

More information

Aspects of Computer Architecture

Aspects of Computer Architecture T V Atkinson, Ph D Senior Academic Specialist Department of Chemistry Michigan State University East Lansing, MI 48824 Table of Contents List of Tables...3 List of Figures...3. Introduction...6.. Why should

More information

The Unicode Standard Version 11.0 Core Specification

The Unicode Standard Version 11.0 Core Specification The Unicode Standard Version 11.0 Core Specification To learn about the latest version of the Unicode Standard, see http://www.unicode.org/versions/latest/. Many of the designations used by manufacturers

More information

The Unicode Standard Version 10.0 Core Specification

The Unicode Standard Version 10.0 Core Specification The Unicode Standard Version 10.0 Core Specification To learn about the latest version of the Unicode Standard, see http://www.unicode.org/versions/latest/. Many of the designations used by manufacturers

More information

The Unicode Standard Version 6.0 Core Specification

The Unicode Standard Version 6.0 Core Specification The Unicode Standard Version 6.0 Core Specification To learn about the latest version of the Unicode Standard, see http://www.unicode.org/versions/latest/. Many of the designations used by manufacturers

More information

This PDF file is an excerpt from The Unicode Standard, Version 4.0, issued by the Unicode Consortium and published by Addison-Wesley.

This PDF file is an excerpt from The Unicode Standard, Version 4.0, issued by the Unicode Consortium and published by Addison-Wesley. This PDF file is an excerpt from The Unicode Standard, Version 4.0, issued by the Unicode Consortium and published by Addison-Wesley. The material has been modified slightly for this online edition, however

More information

L2/ Title: Summary of proposed changes to EAW classification and documentation From: Asmus Freytag Date:

L2/ Title: Summary of proposed changes to EAW classification and documentation From: Asmus Freytag Date: Title: Summary of proposed changes to EAW classification and documentation From: Asmus Freytag Date: 2002-02-13 L2/02-078 1) Based on a detailed review I carried out, the following are currently supported:

More information

Introductory logic and sets for Computer scientists

Introductory logic and sets for Computer scientists Introductory logic and sets for Computer scientists Nimal Nissanke University of Reading ADDISON WESLEY LONGMAN Harlow, England II Reading, Massachusetts Menlo Park, California New York Don Mills, Ontario

More information

ISO/IEC JTC 1/SC 2/WG 2 N2895 L2/ Date:

ISO/IEC JTC 1/SC 2/WG 2 N2895 L2/ Date: ISO International Organization for Standardization Organisation Internationale de Normalisation ISO/IEC JTC 1/SC 2/WG 2 Universal Multiple-Octet Coded Character Set (UCS) ISO/IEC JTC 1/SC 2/WG 2 N2895

More information

The Unicode Standard Version 6.0 Core Specification

The Unicode Standard Version 6.0 Core Specification The Unicode Standard Version 6.0 Core Specification To learn about the latest version of the Unicode Standard, see http://www.unicode.org/versions/latest/. Many of the designations used by manufacturers

More information

Building Apps Last updated: 12 June 2017

Building Apps Last updated: 12 June 2017 Building Apps Last updated: 12 June 2017 Contents 1. Preparing content for your app... 3 1.1. Preparing your lexicon file... 3 1.2. Preparing images... 3 1.3. Preparing audio... 3 2. How to build your

More information

Form number: N2352-F (Original ; Revised , , , , , , ) N2352-F Page 1 of 7

Form number: N2352-F (Original ; Revised , , , , , , ) N2352-F Page 1 of 7 ISO/IEC JTC 1/SC 2/WG 2 PROPOSAL SUMMARY FORM TO ACCOMPANY SUBMISSIONS FOR ADDITIONS TO THE REPERTOIRE OF ISO/IEC 10646 1 Please fill all the sections A, B and C below. (Please read Principles and Procedures

More information

Title: Disposition of comments of ballot results on FPDAM-1 to ISO/IEC 14651:2001

Title: Disposition of comments of ballot results on FPDAM-1 to ISO/IEC 14651:2001 SC22/WG20 N938 Title: Disposition of comments of ballot results on FPDAM-1 to ISO/IEC 14651:2001 Date: 2002-06-11 Project: JTC 1.22.30.02.02 Source: Status: Alain LaBonté, Project editor, on behalf of

More information

ISO/IEC JTC 1/SC 2/WG 2 N3086 PROPOSAL SUMMARY FORM TO ACCOMPANY SUBMISSIONS 1

ISO/IEC JTC 1/SC 2/WG 2 N3086 PROPOSAL SUMMARY FORM TO ACCOMPANY SUBMISSIONS 1 TP PT Form for PT ISO/IEC JTC 1/SC 2/WG 2 N3086 PROPOSAL SUMMARY FORM TO ACCOMPANY SUBMISSIONS 1 FOR ADDITIONS TO THE REPERTOIRE OF ISO/IEC 10646TP Please fill all the sections A, B and C below. Please

More information

International Cataloging: Use Non-Latin Scripts

International Cataloging: Use Non-Latin Scripts OCLC Connexion Client Guides International Cataloging: Use Non-Latin Scripts Revised: September 2011 6565 Kilgour Place, Dublin, OH 43017-3395 www.oclc.org Revision History Date Section title Description

More information

ISO/IEC JTC1/SC2/WG2 N3244

ISO/IEC JTC1/SC2/WG2 N3244 Page 1 of 6 ISO/IEC JTC1/SC2/WG2 N3244 Title Review of CJK-C Repertoire Source UK National Body Document Type National Body Contribution Date 2007-04-14, revised 2007-04-20 The UK national body has carried

More information

Unicode character. Unicode JIS X 0213 GB *2. Unicode character *3. John Mauchly Short Order Code character. Unicode Unicode ASCII.

Unicode character. Unicode JIS X 0213 GB *2. Unicode character *3. John Mauchly Short Order Code character. Unicode Unicode ASCII. Unicode character 2004 2 19 1 ( ) John Mauchly Short Order Code 1949 *1 1967 ASCII ASCII (ISO 2022 Mule ) (Unicode ISO/IEC 10646 ) (IBM NEC ) (e (s-moro@hanazono.ac.jp) *1 Fortran 1957 GT ) Unicode JIS

More information

L2/ ISO/IEC JTC1/SC2/WG2 N4671

L2/ ISO/IEC JTC1/SC2/WG2 N4671 ISO/IEC JTC1/SC2/WG2 N4671 Date: 2015/07/23 Title: Proposal to include additional Japanese TV symbols to ISO/IEC 10646 Source: Japan Document Type: Member body contribution Status: For the consideration

More information

Template for comments and secretariat observations Date: Document: ISO/IEC 10646:2014 PDAM2

Template for comments and secretariat observations Date: Document: ISO/IEC 10646:2014 PDAM2 Template for s and secretariat observations Date: 014-08-04 Document: ISO/IEC 10646:014 PDAM 1 (3) 4 5 (6) (7) on each submitted GB1 4.3 ed Subclause title incorrectly refers to CJK ideographs. Change

More information

****This proposal has not been submitted**** ***This document is displayed for initial feedback only*** ***This proposal is currently incomplete***

****This proposal has not been submitted**** ***This document is displayed for initial feedback only*** ***This proposal is currently incomplete*** 1 of 5 3/3/2003 1:25 PM ****This proposal has not been submitted**** ***This document is displayed for initial feedback only*** ***This proposal is currently incomplete*** ISO INTERNATIONAL ORGANIZATION

More information

ISO/IEC JTC1/SC 22/WG 20 N

ISO/IEC JTC1/SC 22/WG 20 N ISO/IEC JTC1/SC 22/WG 20 N 619665 Date: 16 November 19981999-04-21 ISO ORGANISATION INTERNATIONALE DE NORMALISATION INTERNATIONAL ORGANIZATION FOR STANDARDIZATION ˆ ˇ CEI (IEC) COMMISSION ÉLECTROTECHNIQUE

More information

RomanCyrillic Std v. 7

RomanCyrillic Std v. 7 https://doi.org/10.20378/irbo-52591 RomanCyrillic Std v. 7 Online Documentation incl. support for Unicode v. 9, 10, and 11 (2016 2018) UNi code A З PDF! Ѿ Sebastian Kempgen 2018 RomanCyrillic Std: new

More information

The Unicode Standard Version 11.0 Core Specification

The Unicode Standard Version 11.0 Core Specification The Unicode Standard Version 11.0 Core Specification To learn about the latest version of the Unicode Standard, see http://www.unicode.org/versions/latest/. Many of the designations used by manufacturers

More information

UNICODE IDENTIFIER AND PATTERN SYNTAX

UNICODE IDENTIFIER AND PATTERN SYNTAX 1 of 21 1/29/2008 10:32 AM Technical Reports Proposed Update to Unicode Standard Annex #31 UNICODE IDENTIFIER AND PATTERN SYNTAX Version Unicode 5.1 (draft 6) Authors Mark Davis (mark.davis@google.com)

More information

Proposed Overhaul of kzvariant Data in the Unihan Database

Proposed Overhaul of kzvariant Data in the Unihan Database Proposed Overhaul of kzvariant Data in the Unihan Database John H. Jenkins 26 October 2015 The kzvariant data in the Unihan database is known to be of uneven quality. I recommend we resolve this problem

More information

UNIEDIT USER S GUIDE DUKE UNIVERSITY MULTILINGUAL TEXT EDITOR HUMANITIES COMPUTING FACILITY

UNIEDIT USER S GUIDE DUKE UNIVERSITY MULTILINGUAL TEXT EDITOR HUMANITIES COMPUTING FACILITY UNIEDIT MULTILINGUAL TEXT EDITOR USER S GUIDE HUMANITIES COMPUTING FACILITY DUKE UNIVERSITY Copyright Information COPYRIGHT 1998 BY THE HUMANITIES COMPUTING FACILITY, DUKE UNIVERSITY. ALL RIGHTS RESERVED.

More information

Network Working Group. Category: Informational July 1995

Network Working Group. Category: Informational July 1995 Network Working Group M. Ohta Request For Comments: 1815 Tokyo Institute of Technology Category: Informational July 1995 Status of this Memo Character Sets ISO-10646 and ISO-10646-J-1 This memo provides

More information

Information technology Keyboard layouts for text and office systems. Part 9: Multi-lingual, multiscript keyboard layouts

Information technology Keyboard layouts for text and office systems. Part 9: Multi-lingual, multiscript keyboard layouts INTERNATIONAL STANDARD ISO/IEC 9995-9 First edition 2016-10-01 Information technology Keyboard layouts for text and office systems Part 9: Multi-lingual, multiscript keyboard layouts Technologies de l

More information

The Unicode Standard Version 10.0 Core Specification

The Unicode Standard Version 10.0 Core Specification The Unicode Standard Version 10.0 Core Specification To learn about the latest version of the Unicode Standard, see http://www.unicode.org/versions/latest/. Many of the designations used by manufacturers

More information

UNICODE IDNA COMPATIBLE PREPROCESSSING

UNICODE IDNA COMPATIBLE PREPROCESSSING 1 of 12 1/23/2009 2:51 PM Technical Reports Proposed Draft Unicode Technical Standard #46 UNICODE IDNA COMPATIBLE PREPROCESSSING Version 1 (draft 1) Authors Mark Davis (markdavis@google.com), Michel Suignard

More information

1. Introduction 2. TAMIL DIGIT ZERO JTC1/SC2/WG2 N Character proposed in this document About INFITT and INFITT WG

1. Introduction 2. TAMIL DIGIT ZERO JTC1/SC2/WG2 N Character proposed in this document About INFITT and INFITT WG JTC1/SC2/WG2 N2741 Dated: February 1, 2004 Title: Proposal to add Tamil Digit Zero (DRAFT) Source: International Forum for Information Technology in Tamil (INFITT) Action: For consideration by UTC and

More information

Request for Comments: 3536 Category: Informational May Terminology Used in Internationalization in the IETF

Request for Comments: 3536 Category: Informational May Terminology Used in Internationalization in the IETF Network Working Group P. Hoffman Request for Comments: 3536 IMC & VPNC Category: Informational May 2003 Status of this Memo Terminology Used in Internationalization in the IETF This memo provides information

More information

The Unicode Standard Version 12.0 Core Specification

The Unicode Standard Version 12.0 Core Specification The Unicode Standard Version 12.0 Core Specification To learn about the latest version of the Unicode Standard, see http://www.unicode.org/versions/latest/. Many of the designations used by manufacturers

More information

Proposed Update Unicode Standard Annex #9

Proposed Update Unicode Standard Annex #9 1 of 52 1/30/2015 11:23 AM Technical Reports Proposed Update Unicode Standard Annex #9 Version Unicode 8.0.0 (draft 4) Editors Date 2015-01-07 This Version Previous Version Latest Version Latest Proposed

More information

ISO/IEC JTC 1/SC 2/WG 2 Proposal summary form N2652-F accompanies this document.

ISO/IEC JTC 1/SC 2/WG 2 Proposal summary form N2652-F accompanies this document. Dated: April 28, 2006 Title: Proposal to add TAMIL OM Source: International Forum for Information Technology in Tamil (INFITT) Action: For consideration by UTC and ISO/IEC JTC 1/SC 2/WG 2 Distribution:

More information

Proposed Update Unicode Technical Standard #10

Proposed Update Unicode Technical Standard #10 of 69 7/14/2010 12:04 PM Technical Reports Proposed Update Unicode Technical Standard #10 Version 6.0.0 draft 5 Authors Editors Mark Davis (markdavis@google.com), Ken Whistler (ken@unicode.org) Date 2010-07-09

More information

UNICODE SUPPORT FOR MATHEMATICS

UNICODE SUPPORT FOR MATHEMATICS Technical Reports UTC-Review: Unicode Technical Report #25 UNICODE SUPPORT FOR MATHEMATICS Version 1.0 Authors Date This Version Previous Version Latest Version Barbara Beeton (bnb@ams.org), Asmus Freytag

More information

The Use of Unicode in MARC 21 Records. What is MARC?

The Use of Unicode in MARC 21 Records. What is MARC? # The Use of Unicode in MARC 21 Records Joan M. Aliprand Senior Analyst, RLG What is MARC? MAchine-Readable Cataloging MARC is an exchange format Focus on MARC 21 exchange format An implementation may

More information

The Unicode Standard Version 6.2 Core Specification

The Unicode Standard Version 6.2 Core Specification The Unicode Standard Version 6.2 Core Specification To learn about the latest version of the Unicode Standard, see http://www.unicode.org/versions/latest/. Many of the designations used by manufacturers

More information

German National Body comment on SC 2 N4052 Date: Document: WG2 N3592-Germany

German National Body comment on SC 2 N4052 Date: Document: WG2 N3592-Germany German National Body on SC N405 Date: 009-03-11 Document: WG N359-Germany 1 (3) 4 5 (6) (7) DE te (1) Kana on each submitted Germany recommends the addition the character U+1B000 KATAKANA LETTER ARCHAIC

More information

ISO/IEC JTC 1/SC 2 N 3332/WG2 N 2057

ISO/IEC JTC 1/SC 2 N 3332/WG2 N 2057 ISO/IEC JTC 1/SC 2 N 3332/WG2 N 2057 Date: 1999-06-22 ISO/IEC JTC 1/SC 2 CODED CHARACTER SETS SECRETARIAT: JAPAN (JISC) DOC TYPE: TITLE: SOURCE: Other document National Body Comments on SC 2 N 3297, WD

More information