ISO/IEC JTC 1/SC 2 Yoshiki MIKAMI, SC 2 Chair Toshiko KIMURA, SC 2 Secretariat 2012 JTC 1 Plenary 2012/11/05-10, Jeju
what is new Work items ISO/IEC 10646 2 nd ed. 3 rd ed. (2012) ISO/IEC 14651 Amd.1-2 Participants ( ): last plenary P-member: 30 (29) O-member: 21 (22) Liaisons No change from the last plenary The wall at Jehan Rectus square, Paris written in 300 different languages 2
50 years of character code standard development by SC2 1967 ISO 646 1973 ISO 2022 1987 ISO 8859 1993 ISO 10646 65,536= 2 16 3
ISO646:1967 Only 12 code points for I18N 0 1 2 3 4 5 6 7 0 SP 0 p P 1! 1 A q A Q 2 " 2 B r B R 3 3 C s C S 4 4 d t D T 5 % 5 e u E U 6 & 6 f v F V 7 ' 7 g w G W 8 ( 8 h x H X 9 ) 9 I y I Y A * : j z J Z B + ; k K C, < l L D - = m M E. > n N F /? o _ O DEL DE BE DA ES FR IT 2/3 # # 2/4 $ $ $ $ $ $ 4/0 à É à 5/B Ä º Æ º º 5/C Ö ç Ø Ñ ç ç 5/D Ü Å é 5/E ^ ^ Ü ^ ^ ^ 6/0 ` ` é ` ` ù 7/B Ä é æ º é à 7/C Ö ij ø ñ ù ò 7/D Ü è å ç è è 7/E ß ü ~ ì ì Only 12 code points are available for Internationalization (I18N). The scope is limited to Latin alphabet users. 4
Localization efforts went on by national SDOs Country/script 1960s 1970s 1980s 1990s Latin/USA ASCII ISO 646 Latin/Europe ECMA-6 extension to European languages Cyrillic GOST 13052 extension to other languages Arabic CODAR-U ASMO 449 Hebrew ECMA 121 Japan JIS C 6220 JIS C 6226 JIS X 0212 China GB 2312 GB13000 Korea KS C 5601 KS X 1005 Thailand TIS 620 India ISCII 83 IS 13194 Vietnam TCVN 5412 Sri Lanka SLS 1134 and more 5
ISO 2022:1973 Introduced code switching ISO-IR Registration history ASMO 449=ISO 9036 year Latin-based Other scripts 1975 ISO 646 IRV JIS C 6220 BSI 4730 (Japanese Kana) ANSI X3.4 NATS SEN 85020 DIN 66003 NF Z 62-010 1976 ISO 5428 (Greek) 1979 DIN 31624 ISO 5427 (Cyrillic) ISO 6438 (African) 1982 ISO 5426 NS 4551 CODAR-U (Arabic) GB 2312 (Hanzi) 1984 ASMO 449 (Arabic) 1987 CNS 369103 ECMA113 (Hebrew) 1988 KS C 5601 (Hangul) 1992 TIS 620 (Thai) 0 1 2 3 4 5 6 7 0 0 @ ذ 1! 1 ء ر ف 2 " 2 آ ز ق 3 # 3 أ س ك 4 4 ؤ ش ل 5 % 5 إ ص م 6 & 6 ئ ض ن 7 ' 7 ا ط ه 8 ) 8 ب ظ و 9 ( 9 ة ع ى A * : ت غ ي B + ث ] } C > ج D - = ح [ { E. < خ ^ ~ F / د _ 6
ISO 8859 series:1987-8-bit multilingual codes 8859 Part languages Latin-1 West European languages Latin-2 East European languages Latin-3 Esperant, Maltese, etc. Latin-4 Scandinavian languages Latin-Cyrillic Slavic languages Latin-Arabic Arabic Latin-Hebrew Hebrew Latin-5 Turkish, Icelandic, etc. Latin-6 Falloes, Irish, etc. Latin-7 Greenlandish, Sami Latin-8 Celtic languages but scope is still limited. ISO 8859-1 Latin-1 0 1 2 3 4 5 6 7 8 9 A B C D E F 0 0 @ p ` P À Ð à ð 1! 1 a q A Q ± Á Ñ á ñ 2 " 2 b r B R ² Â Ò â ò 3 # 3 c s C S ³ Ã Ó ã ó 4 $ 4 d t D T Ä Ô ä ô 5 % 5 e u E U µ Å Õ å õ 6 & 6 f v F V Æ Ö æ ö 7 ' 7 g w G W Ç ç 8 ( 8 h x H X È Ø è ø 9 ) 9 I y I Y ¹ É Ù é ù A * : j z J Z ª º Ê Ú ê ú B + ; k [ K { Ë Û ë û C, < l L ¼ Ì Ü ì ü D - = m ] M } - ½ Í Ý í ý E. > n ^ N ~ ¾ Î Þ î þ F /? o _ O Ï ß ï ÿ 7
ISO/IEC 10646:1993 Universal Coded-character Set 00 01 02 20 21 22 40 41 42 FD FE FF 00! " @ A B ý þ ÿ 01 Ā ā Ă Ġ ġ Ģ ŀ Ł ł ǽ Ǿ ǿ 06 ء آ ف ق 09 ठ ड ढ Basic Multilingual Plane 256 256 =65,536 characters BMP SMP SIP 30 〡〢ぁあヽヾ 4E 一丁丂 丠両丢 乀乁乂 份仾仿 4F 伀企伂 传伡伢 佀佁佂 俽俾俿 : AC 가각갂갠갡갢걀걁걂곽곾곿 Every character is uniquely identified without switching and other additional mechanism 8
Expansion of UCS After 1990s ISO/IEC 10646-1 1993 ISO/IEC 10646-1 2000 ISO/IEC 10646 2003 AMD 1~8 ISO/IEC 10646 2011 Unicode V 1.1 Unicode V 3.0 Unicode V 4.0 Unicode V 6.0 Basic Multilingual Plain (BMP) 256 x 256 = 65,536 Supplementary Plains Max 1,024² = 1,048K 9
Universal Declaration of Human Rights in 368 languages Fully digitized basic human rights document can be viewed in 368 languages To view, please visit http://www.unicode.org/udhr/ 10
user engagement Participation SC2, SC2/WG2 are continuously keeping contact with user communities of the scripts Established C liaison with UC Berkeley in the field of particular minority and historic scripts Requirements addressed New wave: EMOJI (pictogram for mobile comm.) Minorities: Sindhi, Mro, Bassa Vah, etc. Historians: Oracle Bone Script, Palmyrene, etc. Challenges No explicit challenges from outside of SC2 or JTC1 Harmonization of national standards with ISO/IEC 10646 11
known implementations Web pages and SNS 271 languages found at wikipedia Domain Names IDNs SNSs are covering increasing number of languages Facebook 74 languages / YouTube 51 languages Search Engines Google indexes more than 50 language pages with 150 interface languages Word-processing, printing Typical software can handle all languages Ideographic Variation Sequence (IVS) technology is expected to overcome the incompatibility of user-defined characters 12
challenges Potential requests from user groups of minority and historic scripts are still strong and increasing Implementation of UCS: Standard is not the goal but the start line for their actual usage How can we accelerate implementation process Historic scripts not well studied or not deciphered WSIS Action Lines C8. Cultural and linguistic diversity is essential to the development of an Information Society Governments should promote the development of standard character sets, electronic dictionaries, multilingual search engines, machine translation tools, IDNs, etc. 13
issues or needs Resource constraints Total number of P-/Omembers > 50 Developing world members often lack resources to participate & implement Many countries have no official membership status, but have scripts to be coded Assistance needed collaboration with DEVCO to be explored 14
2013 Directions Strategic characteristics Interoperability Portability Cultural & linguistic adaptability Accessibility [ISO/IEC Directives JTC1 Supplement 2.1.2] SC 2 Plenary Meeting: 2012-10, Cjiang Mai, Thailand International Mother Language Day (February 21) 15