Unicode character. Unicode JIS X 0213 GB 2. Unicode character 3. John Mauchly Short Order Code character. Unicode Unicode ASCII.

Size: px

Start display at page:

Download "Unicode character. Unicode JIS X 0213 GB *2. Unicode character *3. John Mauchly Short Order Code character. Unicode Unicode ASCII."

Rodney Small
5 years ago
Views:

1 Unicode character ( ) John Mauchly Short Order Code 1949 * ASCII ASCII (ISO 2022 Mule ) (Unicode ISO/IEC ) (IBM NEC ) (e (s-moro@hanazono.ac.jp) *1 Fortran 1957 GT ) Unicode JIS X 0213 GB *2 Unicode character *3 character Unicode Unicode Unicode TRON ( ) Unicode Version 4.0 ([2]) *2 Unicode GB18030 Unicode [8] *3 character Unicode character 1

2 2 character Unicode character 2 10 Unicode Design Principles ([2], p. 14) 1. Universality 2. Efficiency 3. Characters, Not Glyphs 4. Semantics 5. Plain Text 6. Logical Order 7. Unification 8. Dynamic Composition 9. Equivalent Sequences 10. Convertibility Unicode 1 Universality Version 3.2 Sixteen-bit character code ([9] ) B Unification ([5]) ( ) Characters, Not Glyphs Semantics Logical Order Unification 2.2 character character 3 Characters, Not Glyphs character The Unicode Standard draws a distinction between characters and glyphs. Characters are the abstract representations of the smallest components of written language that have semantic value. (...) Characters represented by code points. (...) The Unicode Standard deals only with character codes. ([2], p. 15) Unicode character character Unicode Abstract Character ([2], p. 64) character ([2], p. 1365) Abstract Character Abstract Character: A unit of information used for the organization, control, or representation of textual data. When representing data, the nature of that data is generally symbolic as opposed to some other kind of data (for example, aural or visual). Examples of such symbolic data include letters, ideographs, digits, punctuation, technical symbols, and dingbats. An abstract character has no concrete form and should not be confused with a glyph. An abstract character does not necessarily corrspond to what a user thinks of as a character and should not be confused with a grapheme. The abstract characters encoded by 2

3 the Unicode Standard are known as Unicode abstract characters. Abstract characters not directly encoded by the Unicode Standard can often be represented by the use of combining character sequences. Unicode character *4 symbolic character semantic value semantic value 4 Characters have well-defined semantics. Characters property tables are provided for use in parsing, sorting, and other algorithms requiring semantic knowledge about the code points. The properties identified by the Unicode Standard include numeric, spacing, combination, and directionarity properties (...). Additional properties may be defined as needed from time to time. ([2], pp ) character semantic value semantics *4 semantics semantics welldefined semantics 2.3 Unification script character Unification *5 The Unicode Standard avoids duplicate encoding of characters by unifying them within scripts across languages; characters that are equivalent are given a single code. (...) Avoidance of duplicate encoding of characters is important to avoid visual ambiguity. ([2], pp ) language script Script. A collection of symbols used to represent textual information in one or more writing systems. ([2], p. 1377) (language) = (writing system) script script x x script *5 Unification 3

4 script script Unicode script Unicode TRON script language visual ambiguity character *6 semantics welldefinedcharacter 2.4 Unicode 7 Unicode text is stored in logical order in the memory representation, roughly corresponding to the order in which text is typed in via the keyboard. In some circumstances, the order of characters differs from this logical order when the text is displayed or printed. (...) For the most part, logical order corresponds to phonetic order. ([2], pp ) Unicode (logical order) (phonetic order) ( ) / / / character logical *7 3 character 3.1 character AI character *6 ( ) (A ) *7 Gelb theory of writing Gelb IPA ([1]) Gelb ([3]) 4

5 ([4], p. 276) ([4], p. 365) character (the smallest components of written language) ( ) character logical order Unicode ([3]) character *8 3.2 character *8 [7] character 1 (dissémination) ( 1) ( ) 1 1 5

6 character Unicode character 4 Unicode character character character CHISE *9 Chaon ([7]) [3].., 1972., [4] L...,, [5]. ISO/IEC Unicode., Vol. 2,, [6]. CHISE Project., Vol. 4,, [7]. Surface or essence: Beyond the coded character set model. 21 COE, ( ).. [8]. GB18030., Vol. 2,, [9]. Unicode 4.0., Vol. 4,, [1] Ignace J. Gelb. A Study of Writing. University of Chicago Press, revised edition, [2] The Unicode Consortium. The Unicode Standard, Version 4.0. Addison-Wesley, Boston, *9 projects/chise/ chise/ [6] 6

The Unicode Standard Version 11.0 Core Specification

The Unicode Standard Version 11.0 Core Specification To learn about the latest version of the Unicode Standard, see http://www.unicode.org/versions/latest/. Many of the designations used by manufacturers