ISO/IEC JTC 1/SC 2. Yoshiki MIKAMI, SC 2 Chair Toshiko KIMURA, SC 2 Secretariat JTC 1 Plenary 2012/11/05-10, Jeju

Similar documents
Arabic Text Segmentation

Modeling Nasta leeq Writing Style

1 See footnote 2, below.

Improved Method for Sliding Window Printed Arabic OCR

qatar national day 2017 brand guidelines 2017

Umbrella. Branding & Guideline

REEM READYMIX Brand Guideline

OUR LOGO. SYMBOL LOGO SYMBOL LOGO ORIGINAL STRUCTURE

Code Extension Technique Standard: ISO/IEC 2022

Identity Guidelines. December 2012

Proposed Solution for Writing Domain Names in Different Arabic Script Based Languages

L2/11-033R 1 Introduction

Proposed keyboard layout for Swahili in Arabic script

THE LOGO Guidelines LOGO. Waste Free Environment Brand Guidelines

2011 Martin v. Löwis. Data-centric XML. Character Sets

2007 Martin v. Löwis. Data-centric XML. Character Sets

VOL. 3, NO. 7, Juyl 2012 ISSN Journal of Emerging Trends in Computing and Information Sciences CIS Journal. All rights reserved.

Communication and processing of text in the Kildin Sámi, Komi, and Nenets, and Russian languages.

SCALE-SPACE APPROACH FOR CHARACTER SEGMENTATION IN SCANNED IMAGES OF ARABIC DOCUMENTS

Main Brandmark. Alternative option 1: White torch and white logotype on orange background

3 Qurʾānic typography Qurʾānic typography involves getting the following tasks done.

This file contains an excerpt from the character code tables and list of character names for The Unicode Standard, Version 3.0.

Online Arabic Handwritten Character Recognition Based on a Rule Based Approach

BRAND GUIDELINES JANUARY 2017

ESCAPE SEQUENCE G0: ESC 02/08 04/13 C0: C1: NAME Extended African Latin alphabet coded character set for bibliographic information interchange

This proposal is limited to the addition and rearrangement of some of the Korean character part of ISO/IEC (UCS2).

ISO/IEC JTC 1/SC 2 N 4253 DATE:

Developing a Real Time Method for the Arabic Heterogonous DBMS Transformation

Strings 20/11/2018. a.k.a. character arrays. Strings. Strings

Communication and processing of text in the Chuvash, Erzya Mordvin, Komi, Hill Mari, Meadow Mari, Moksha Mordvin, Russian, and Udmurt languages.

The Unicode Standard Version 11.0 Core Specification

2011 International Conference on Document Analysis and Recognition

ISeCure. The ISC Int'l Journal of Information Security. High Capacity Steganography Tool for Arabic Text Using Kashida.

Writing Domain Names in Different Arabic Script Based Languages

A MELIORATED KASHIDA-BASED APPROACH FOR ARABIC TEXT STEGANOGRAPHY

Appendix C. Numeric and Character Entity Reference

SHEFA STORE CORPORATE DESIGN MANUAL BRAND & FUNCTION // CORPORATE DESIGN GUIDELINES. 01 : Corporate Identity. 02 : Corporate Stationery

Award Winning Typefaces by Linotype

THE LOGO Guidelines LOGO. Waste Free Environment Brand Guidelines

Printed and Handwritten Arabic Characters Recognition and Convert It to Editable Text Using K-NN and Fuzzy Logic Classifiers

Recognition of secondary characters in handwritten Arabic using Fuzzy Logic

1. Brand Identity Guidelines.

Character Set Supported by Mehr Nastaliq Web beta version

ERD ENTITY RELATIONSHIP DIAGRAM

UTF and Turkish. İstinye University. Representing Text

Representing Characters and Text

American University of Beirut Logo and Visual Identity Manual. April 2011 version 1.0

Using non-latin alphabets in Blaise

Segmentation and Recognition of Arabic Printed Script

GEOMETRIC-TOPOLOGICAL BASED ARABIC CHARACTER RECOGNITION, A NEW APPROACH

Introduction to Search and Recommendation

Internationalization of a Distance Exam Web Environment

ISO/IEC JTC 1/SC 2 N 3194

New Features in mpdf v5.6

Arabic Diacritics Based Steganography Mohammed A. Aabed, Sameh M. Awaideh, Abdul-Rahman M. Elshafei and Adnan A. Gutub

Center for Language Engineering Al-Khawarizmi Institute of Computer Science University of Engineering and Technology, Lahore

APPLESHARE PC UPDATE INTERNATIONAL SUPPORT IN APPLESHARE PC

Nafees Nastaleeq v1.01 beta

Unicode and Non Unicode Printing with the Swiss 721 Font

DESIGNING OFFLINE ARABIC HANDWRITTEN ISOLATED CHARACTER RECOGNITION SYSTEM USING ARTIFICIAL NEURAL NETWORK APPROACH. Ahmed Subhi Abdalkafor 1*

Research Article Offline Handwritten Arabic Character Recognition Using Features Extracted from Curvelet and Spatial Domains

Domain Names in Pakistani Languages. IDNs for Pakistani Languages

SHARP-EDGES METHOD IN ARABIC TEXT STEGANOGRAPHY

Enabling Complex Asian Scripts on Mobile Devices

MEMORANDUM OF UNDERSTANDING. between [PLEASE INSERT NAME OF THE OVERSEAS GOVERNMENT AGENCY AND NAME OF THE COUNTRY] and

Internationalized Domain Names from a Cultural Perspective

PERFORMANCE OF THE GOOGLE DESKTOP, ARABIC GOOGLE DESKTOP AND PEER TO PEER APPLICATION IN ARABIC LANGUAGE

ATypI Hongkong Development of a Pan-CJK Font

Personal Conference Manager (PCM)

Recent Trends in Standardization of Japanese Character Codes

1 Lithuanian Lettering

USING COMBINATION METHODS FOR AUTOMATIC RECOGNITION OF WORDS IN ARABIC

font faq HOW TO INSTALL YOUR FONT HOW TO INSERT SWASHES, ALTERNATES, AND ORNAMENTS

Problems with FrameMaker 7 on MS Windows and non-western languages

You 2 Software

The University of Bradford Institutional Repository

Lecture 5 C Programming Language

Different Input Systems for Different Devices

ISO/IEC JTC/SC2/WG Universal Multiple Octet Coded Character Set (UCS)

Managing Resource Sharing Conflicts in an Open Embedded Software Environment

ISO/IEC JTC 1/SC 2/WG 2 PROPOSAL SUMMARY FORM TO ACCOMPANY SUBMISSIONS 1. T Uhttp://

ä + ñ ISO/IEC JTC1/SC2/WG2 N

ISO/IEC JTC 1/SC 2 N 3555 DATE: REPLACES: SC 2N3472

! " # $ % & ' ( ) * +, -. / : ; < =

Introduction to Search/Retrieval Technologies. Yi Zhang Information System and Technology Management IRKM Lab University of California Santa Cruz

font faq HOW TO INSTALL YOUR FONT HOW TO INSERT SWASHES, ALTERNATES, AND ORNAMENTS

Multifont Arabic Characters Recognition Using HoughTransform and HMM/ANN Classification

Network Working Group. Category: Informational July 1995

ISO/IEC JTC 1/SC 2 N WG2 N2436 DATE:

Information Standards Quarterly

androidcode.ir/post/install-eclipse-windows-android-lynda

ASCII Code - The extended ASCII table

Brand Identity Manual Fonts and Typography

BÉZIER CURVES TO RECOGNIZE MULTI-FONT ARABIC ISOLATED CHARACTERS

Feature Extraction Techniques of Online Handwriting Arabic Text Recognition

ISO/IEC JTC 1/SC 2 N 3332/WG2 N 2057

NF-SAVO: Neuro-Fuzzy system for Arabic Video OCR

Myriad Pro Light. Lining proportional. Latin capitals. Alphabetic. Oldstyle tabular. Oldstyle proportional. Superscript ⁰ ¹ ² ³ ⁴ ⁵ ⁶ ⁷ ⁸ ⁹,.

Using Arabic Wordnet for semantic indexation in information retrieval system

Version /10/2015. Type specimen. Bw STRETCH

Transcription:

ISO/IEC JTC 1/SC 2 Yoshiki MIKAMI, SC 2 Chair Toshiko KIMURA, SC 2 Secretariat 2012 JTC 1 Plenary 2012/11/05-10, Jeju

what is new Work items ISO/IEC 10646 2 nd ed. 3 rd ed. (2012) ISO/IEC 14651 Amd.1-2 Participants ( ): last plenary P-member: 30 (29) O-member: 21 (22) Liaisons No change from the last plenary The wall at Jehan Rectus square, Paris written in 300 different languages 2

50 years of character code standard development by SC2 1967 ISO 646 1973 ISO 2022 1987 ISO 8859 1993 ISO 10646 65,536= 2 16 3

ISO646:1967 Only 12 code points for I18N 0 1 2 3 4 5 6 7 0 SP 0 p P 1! 1 A q A Q 2 " 2 B r B R 3 3 C s C S 4 4 d t D T 5 % 5 e u E U 6 & 6 f v F V 7 ' 7 g w G W 8 ( 8 h x H X 9 ) 9 I y I Y A * : j z J Z B + ; k K C, < l L D - = m M E. > n N F /? o _ O DEL DE BE DA ES FR IT 2/3 # # 2/4 $ $ $ $ $ $ 4/0 à É à 5/B Ä º Æ º º 5/C Ö ç Ø Ñ ç ç 5/D Ü Å é 5/E ^ ^ Ü ^ ^ ^ 6/0 ` ` é ` ` ù 7/B Ä é æ º é à 7/C Ö ij ø ñ ù ò 7/D Ü è å ç è è 7/E ß ü ~ ì ì Only 12 code points are available for Internationalization (I18N). The scope is limited to Latin alphabet users. 4

Localization efforts went on by national SDOs Country/script 1960s 1970s 1980s 1990s Latin/USA ASCII ISO 646 Latin/Europe ECMA-6 extension to European languages Cyrillic GOST 13052 extension to other languages Arabic CODAR-U ASMO 449 Hebrew ECMA 121 Japan JIS C 6220 JIS C 6226 JIS X 0212 China GB 2312 GB13000 Korea KS C 5601 KS X 1005 Thailand TIS 620 India ISCII 83 IS 13194 Vietnam TCVN 5412 Sri Lanka SLS 1134 and more 5

ISO 2022:1973 Introduced code switching ISO-IR Registration history ASMO 449=ISO 9036 year Latin-based Other scripts 1975 ISO 646 IRV JIS C 6220 BSI 4730 (Japanese Kana) ANSI X3.4 NATS SEN 85020 DIN 66003 NF Z 62-010 1976 ISO 5428 (Greek) 1979 DIN 31624 ISO 5427 (Cyrillic) ISO 6438 (African) 1982 ISO 5426 NS 4551 CODAR-U (Arabic) GB 2312 (Hanzi) 1984 ASMO 449 (Arabic) 1987 CNS 369103 ECMA113 (Hebrew) 1988 KS C 5601 (Hangul) 1992 TIS 620 (Thai) 0 1 2 3 4 5 6 7 0 0 @ ذ 1! 1 ء ر ف 2 " 2 آ ز ق 3 # 3 أ س ك 4 4 ؤ ش ل 5 % 5 إ ص م 6 & 6 ئ ض ن 7 ' 7 ا ط ه 8 ) 8 ب ظ و 9 ( 9 ة ع ى A * : ت غ ي B + ث ] } C > ج D - = ح [ { E. < خ ^ ~ F / د _ 6

ISO 8859 series:1987-8-bit multilingual codes 8859 Part languages Latin-1 West European languages Latin-2 East European languages Latin-3 Esperant, Maltese, etc. Latin-4 Scandinavian languages Latin-Cyrillic Slavic languages Latin-Arabic Arabic Latin-Hebrew Hebrew Latin-5 Turkish, Icelandic, etc. Latin-6 Falloes, Irish, etc. Latin-7 Greenlandish, Sami Latin-8 Celtic languages but scope is still limited. ISO 8859-1 Latin-1 0 1 2 3 4 5 6 7 8 9 A B C D E F 0 0 @ p ` P À Ð à ð 1! 1 a q A Q ± Á Ñ á ñ 2 " 2 b r B R ² Â Ò â ò 3 # 3 c s C S ³ Ã Ó ã ó 4 $ 4 d t D T Ä Ô ä ô 5 % 5 e u E U µ Å Õ å õ 6 & 6 f v F V Æ Ö æ ö 7 ' 7 g w G W Ç ç 8 ( 8 h x H X È Ø è ø 9 ) 9 I y I Y ¹ É Ù é ù A * : j z J Z ª º Ê Ú ê ú B + ; k [ K { Ë Û ë û C, < l L ¼ Ì Ü ì ü D - = m ] M } - ½ Í Ý í ý E. > n ^ N ~ ¾ Î Þ î þ F /? o _ O Ï ß ï ÿ 7

ISO/IEC 10646:1993 Universal Coded-character Set 00 01 02 20 21 22 40 41 42 FD FE FF 00! " @ A B ý þ ÿ 01 Ā ā Ă Ġ ġ Ģ ŀ Ł ł ǽ Ǿ ǿ 06 ء آ ف ق 09 ठ ड ढ Basic Multilingual Plane 256 256 =65,536 characters BMP SMP SIP 30 〡〢ぁあヽヾ 4E 一丁丂 丠両丢 乀乁乂 份仾仿 4F 伀企伂 传伡伢 佀佁佂 俽俾俿 : AC 가각갂갠갡갢걀걁걂곽곾곿 Every character is uniquely identified without switching and other additional mechanism 8

Expansion of UCS After 1990s ISO/IEC 10646-1 1993 ISO/IEC 10646-1 2000 ISO/IEC 10646 2003 AMD 1~8 ISO/IEC 10646 2011 Unicode V 1.1 Unicode V 3.0 Unicode V 4.0 Unicode V 6.0 Basic Multilingual Plain (BMP) 256 x 256 = 65,536 Supplementary Plains Max 1,024² = 1,048K 9

Universal Declaration of Human Rights in 368 languages Fully digitized basic human rights document can be viewed in 368 languages To view, please visit http://www.unicode.org/udhr/ 10

user engagement Participation SC2, SC2/WG2 are continuously keeping contact with user communities of the scripts Established C liaison with UC Berkeley in the field of particular minority and historic scripts Requirements addressed New wave: EMOJI (pictogram for mobile comm.) Minorities: Sindhi, Mro, Bassa Vah, etc. Historians: Oracle Bone Script, Palmyrene, etc. Challenges No explicit challenges from outside of SC2 or JTC1 Harmonization of national standards with ISO/IEC 10646 11

known implementations Web pages and SNS 271 languages found at wikipedia Domain Names IDNs SNSs are covering increasing number of languages Facebook 74 languages / YouTube 51 languages Search Engines Google indexes more than 50 language pages with 150 interface languages Word-processing, printing Typical software can handle all languages Ideographic Variation Sequence (IVS) technology is expected to overcome the incompatibility of user-defined characters 12

challenges Potential requests from user groups of minority and historic scripts are still strong and increasing Implementation of UCS: Standard is not the goal but the start line for their actual usage How can we accelerate implementation process Historic scripts not well studied or not deciphered WSIS Action Lines C8. Cultural and linguistic diversity is essential to the development of an Information Society Governments should promote the development of standard character sets, electronic dictionaries, multilingual search engines, machine translation tools, IDNs, etc. 13

issues or needs Resource constraints Total number of P-/Omembers > 50 Developing world members often lack resources to participate & implement Many countries have no official membership status, but have scripts to be coded Assistance needed collaboration with DEVCO to be explored 14

2013 Directions Strategic characteristics Interoperability Portability Cultural & linguistic adaptability Accessibility [ISO/IEC Directives JTC1 Supplement 2.1.2] SC 2 Plenary Meeting: 2012-10, Cjiang Mai, Thailand International Mother Language Day (February 21) 15