Designing & Developing Pan-CJK Fonts for Today

Similar documents
ATypI Hongkong Development of a Pan-CJK Font

The Power of Plain Text & the Importance of Meaningful Content Dr. Ken Lunde Senior Computer Scientist Adobe Systems Incorporated

The Adobe-Japan1-6 Character Collection

Building Source Han Sans & Noto Sans CJK

The Adobe-CNS1-6 Character Collection

Bookmarks for PDF Output(Outline-Group)

Ideographic Variation Sequences

The Unicode Standard Version 11.0 Core Specification

The Unicode Standard Version 10.0 Core Specification

Recent Trends in Standardization of Japanese Character Codes

ISO/IEC/JTC 1/SC 2/WG 2/IRG Ideographic Rapporteur Group (IRG) Resolutions of IRG Meeting #31

Legacy Gaiji Solutions & SING

Ideographic Variation Sequences

ISO/IEC JTC/1 SC/2 WG/2 N2095

Workshop: Manipulating CID-Keyed Fonts Using AFDKO Tools

2011 Martin v. Löwis. Data-centric XML. Character Sets

2007 Martin v. Löwis. Data-centric XML. Character Sets

Introduction 1. Chapter 1

Unicode character. Unicode JIS X 0213 GB *2. Unicode character *3. John Mauchly Short Order Code character. Unicode Unicode ASCII.

ISO/IEC JTC1/SC2/WG2 N 2490

1. Response to IRGN2305 CJK Regional Supplementary Ideographs

The Unicode Standard Version 7.0 Core Specification

Typesetting CJK languages with Omega

Universal Multiple-Octet Coded Character Set UCS

ISO/IEC JTC1/SC2/WG2 N3244

The Localization (CJK) Challenges and Possibilities in Taiwan

Rendering in Dzongkha

Form number: N2352-F (Original ; Revised , , , , , , ) N2352-F Page 1 of 7

The Unicode Standard Version 7.0 Core Specification

John H. Jenkins If available now, identify source(s) for the font (include address, , ftp-site, etc.) and indicate the tools used:

Adobe. SING Technology. Solving the missing character problem. Thomas Phinney Program Manager Fonts & SING Technologies 28 September 2006

1 ISO/IEC JTC1/SC2/WG2 N

The Unicode Standard Version 6.1 Core Specification

Two distinct code points: DECIMAL SEPARATOR and FULL STOP

This manual describes utf8gen, a utility for converting Unicode hexadecimal code points into UTF-8 as printable characters for immediate viewing and

The Unicode Standard Version 12.0 Core Specification

ISO/IEC JTC1/SC2/WG2. Universal Multiple-Octet Coded Character Set (UCS) - ISO/IEC Secretariat: ANSI

Universal Multiple-Octet Coded Character Set UCS

uptex Unicode version of ptex with CJK extensions

Issues and Solutions in Pan-China Search

Adobe Illustrator CS2 Read Me Adobe Systems Inc. March 2005

The Unicode Standard Version 10.0 Core Specification

OpenType Font by Harsha Wijayawardhana UCSC

Proposal to add U+2B95 Rightwards Black Arrow to Unicode Emoji

Unicode definition list

ISO/IEC JTC 1/SC 2 N 3332/WG2 N 2057

AFP Support for TrueType/Open Type Fonts and Unicode

136 TUGboat, Volume 39 (2018), No. 2

ISO/IEC and the Construction of the Circumpacif is Documents Information Network

The Unicode Standard Version 11.0 Core Specification

****This proposal has not been submitted**** ***This document is displayed for initial feedback only*** ***This proposal is currently incomplete***

UNICODE IDEOGRAPHIC VARIATION DATABASE

1. Introduction 2. TAMIL DIGIT ZERO JTC1/SC2/WG2 N Character proposed in this document About INFITT and INFITT WG

State of CJK issues of LibreOffice, m TIRANA 27 Sept.

Proposed Update Unicode Standard Annex #11 EAST ASIAN WIDTH

The Unicode Standard Version 11.0 Core Specification

L2/ ISO/IEC JTC1/SC2/WG2 N4671

ISO/IEC JTC 1/SC 2 N 4004 DATE:

ä + ñ ISO/IEC JTC1/SC2/WG2 N

Introduction. Requests. Background. New Arabic block. The missing characters

CID-Keyed Font Technology Overview

Recent developments in IDNs

INTERNATIONAL ORGANIZATION FOR STANDARDIZATION ORGANISATION INTERNATIONALE DE NORMALISATION ISO/IEC JTC 1/SC 2/WG 2/IRG

ISO/IEC JTC1/SC2/WG2 N4599 L2/

Proposed Update. Unicode Standard Annex #11

Proposal to encode three Arabic characters for Arwi

ISO/IEC JTC 1/SC 2/WG 2 PROPOSAL SUMMARY FORM TO ACCOMPANY SUBMISSIONS FOR ADDITIONS TO THE REPERTOIRE OF ISO/IEC

Proposal to Encode Oriya Fraction Signs in ISO/IEC 10646

Comments on responses to objections provided in N2661

THE UNIVERSITY OF HONG KONG LIBRARIES. Hong Kong Collection. gift from Hong Kong (China). Companies Registry

The Unicode Standard Version 6.0 Core Specification

Representing Characters, Strings and Text

ISO/IEC INTERNATIONAL STANDARD. Information technology Coding of audio-visual objects Part 22: Open Font Format

ISO/IEC JTC1/SC2 Nxxxx ISO/IEC JTC1/SC2/WG2/IRG N2214

Doc Type: Working Group Document Title: Recommendations for encoding Xiàngqí game symbols Source: Michael Everson (

ISO/IEC/JTC 1/SC 2/WG 2/IRG Ideographic Rapporteur Group (IRG) Resolutions of IRG Meeting #32

L2/ Date:

ATypI : TypeTech Forum Lissabon OpenType Status Dr. Jürgen Willrodt Dr. OpenType Status 2006

ISO/IEC JTC 1/SC 2 N 4540 ISO/IEC JTC 1/SC 2/WG2 N 4830

Representing Characters and Text

This proposal is limited to the addition and rearrangement of some of the Korean character part of ISO/IEC (UCS2).

Google 1 April A Generalized Unified Character Code: Western European and CJK Sections

ADOBE 9A Adobe InDesign CS3 ACE. Download Full Version :

INTERNATIONAL ORGANIZATION FOR STANDARDIZATION ORGANISATION INTERNATIONALE DE NORMALISATION ISO/IEC JTC 1/SC 2/WG 2/IRG

DATE: Time: 12:28 AM N SupportN2621 Page: 1 of 9 ISO/IEC JTC 1/SC 2/WG 2

Reply to L2/10-327: Comments on L2/10-280, Proposal to Add Variation Sequences... 1

Multilingual vi Clones: Past, Now and the Future

ISO/IEC JTC 1/SC 2/WG 2 PROPOSAL SUMMARY FORM TO ACCOMPANY SUBMISSIONS FOR ADDITIONS TO THE REPERTOIRE OF ISO/IEC / UNICODE

Title: Application to include Arabic alphabet shapes to Arabic 0600 Unicode character set

Communication through the language barrier in some particular circumstances by means of encoded localizable sentences

General Structure 2. Chapter Architectural Context

Pour les connaisseurs! part c liza s bonus appendix

INTERNATIONAL ORGANIZATION FOR STANDARDIZATION ORGANISATION INTERNATIONALE DE NORMALISATION ISO/IEC JTC 1/SC 2/WG 2/IRG

ISO/IEC JTC/1 SC/2 WG/2 N2312. ISO/IEC JTC/1 SC/2 WG/2 Universal Multiple-Octet Coded Character Set (UCS)

A Basic Course in Font Wrangling Session 23 Saturday, September 12, 2009

1. Introduction 2. TAMIL LETTER SHA Character proposed in this document About INFITT and INFITT WG

RICOH s Layout Engine can correctly display Thai, Arabic,Vietnamese and Hindi with complex grammar rules. Reading in this direction

preliminary draft, June 15, :57 preliminary draft, June 15, :57

JTC1/SC2/WG2 N3514. Proposal to encode a SOCCER BALL symbol Karl Pentzlin, U+26xx SOCCER BALL

Use of ZWJ/ZWNJ with Mongolian Variant Selectors and Vowel Separator SOURCE: Paul Nelson and Asmus Freytag STATUS: Proposal

Transcription:

Designing & Developing Pan-CJK Fonts for Today Ken Lunde Adobe Systems Incorporated 2009 Adobe Systems Incorporated. All rights reserved. 1

What Is A Pan-CJK Font? A Pan-CJK font includes glyphs suitable for multiple CJK locales China, Taiwan, Hong Kong, Japan, and Korea are the five most important CJK locales Han Unification necessitates multiple glyphs for many CJK Unified Ideograph code points A Pan-CJK font is Unicode-based No other character set or encoding in common use today can claim adequate CJK support Unicode has become the preferred method for representing text in digital form A Pan-CJK font is a lot of work How does a Pan-CJK font differ from a Pan-Chinese font? There are three primary Chinese-speaking locales: China, Taiwan, and Hong Kong Some simplified/traditional distinctions have been unified Some distinctions have not been unified, and are thus handled via separate code points A Pan-Chinese font can be treated as a step toward developing a Pan-CJK font 2009 Adobe Systems Incorporated. All rights reserved. 2

Pan-CJK Font Goals Single-locale CJK fonts can be fully-functional with one glyph per code point This is easily demonstrated by today s single-locale CJK fonts Some single-locale CJK fonts still require multiple glyphs for some code points For the purpose of supporting single-locale variant forms Multiple-locale CJK fonts require more than one glyph for some code points CJK Unified Ideograph code points are the obvious target and concern 2009 Adobe Systems Incorporated. All rights reserved. 3

Pan-CJK Font Advantages Design consistency across multiple locales Weight Style Width Relative size Other design factors Smaller overall footprint A large number of glyphs are shared by two or more locales Subroutinization benefits Single font file Streamlined testing 2009 Adobe Systems Incorporated. All rights reserved. 4

CJK Unified Ideographs: URO Versus Extensions Premise: CJK Unified Ideograph code points require multiple glyphs Some code points require only one glyph many are single-source code points Some require four or more glyphs these are multiple-source code points Single- versus multiple-source code points Single-source code points generally require only one glyph Multiple-source code points have the potential to require more than one glyph For example, U+4E00 has six sources, but clearly requires only one glyph URO (Unified Repertoire & Ordering) 20,902 + 22 (Unicode 4.1) + 8 (Unicode 5.1) + 8 (Unicode 5.2) = 20,940 code points Extensions Extensions A (6,582), B (42,711), C (4,149), and D (223) exist The higher the Extension, the greater the percentage of single-source code points The higher the Extension, the lower the percentage of multiple-source code points 2009 Adobe Systems Incorporated. All rights reserved. 5

CJK Unified Ideographs: URO Versus Extensions (cont d) (Number of Sources) 1 2 3 4 5 6 7 URO (20,940) 9% 7% 11% 18% 32% 22% 1% Extension A (6,582) 9% 27% 41% 20% 3% >0% >0% Extension B (42,711) 45% 41% 14% 1% >0% Extension C (4,149) 91% 8% >0% >0% 2009 Adobe Systems Incorporated. All rights reserved. 6

CJK Unified Ideographs: URO Versus Extensions (cont d) Significantly more work is required for URO code points The URO has a high percentage of multiple-source code points Remember that multiple-source code points have the potential to require multiple glyphs The higher the Extension, the less work that is required Higher Extensions have a higher percentage of single-source code points Code-point/glyph-count ratios The URO and Extension A require roughly a 50% increase in glyphs over code points Approximately 30K glyphs are necessary to cover the 20,940 URO code points Approximately 10K glyphs are necessary to cover the 6,582 Extension A code points 2009 Adobe Systems Incorporated. All rights reserved. 7

Locale-specific Glyph Issues Different glyphs for the same locale Code charts versus current sources Some sources have changed over time JIS X 0213:2004 (Japan) is a good example Handling CJK Unified Ideographs without sources for specific locales For some, such as those specific to a single locale, it is appropriate to ignore Simplified Chinese is a good example For the remainder, it becomes a policy issue Extrapolate or ignore 2009 Adobe Systems Incorporated. All rights reserved. 8

Locale-specific Glyph Issues Specific Examples Source glyphs that changed over time: U+8FBB Original Japan source: (JIS X 0208-1990 36-52) Current Japan source: (JIS X 0213:2004 1-36-52) Multiple-source CJK Unified Ideographs that require only one glyph: U+4E00 All sources: Two glyphs serve more than two code points: U+5668, U+FA38 & U+20F96 U+5668 glyphs for Japan, and for all other sources U+FA38 glyph (ignoring that the distinction which is meant to be preserve cannot be preserved) for Japan U+20F96 glyph for Taiwan 2009 Adobe Systems Incorporated. All rights reserved. 9

Pan-CJK Font Implementation Details TrueType Collection via separate font instances Pro: GSUB feature support is not necessary Con: Multiple font instances in application font menus Can also be considered a Pro OpenType via locl GSUB feature Pro: Single font instance in application font menus Can also be considered a Con Con: locl GSUB feature support is necessary; must choose default locale Dealing with the 64K glyph barrier Depends on the extent to which CJK Unified Ideograph blocks are covered This is a clear concern when supporting all of Extension B 2009 Adobe Systems Incorporated. All rights reserved. 10

Implementing Pan-CJK Fonts: OpenType Use the Adobe-Identity-0 ROS ROS means /Registry = Adobe ; /Ordering = Identity ; /Supplement = 0 A dynamic, locale-unspecific special-purpose glyph set Use the locl (Localized Forms) GSUB feature One locale must necessarily serve as the default Simplified Chinese is a suitable default locale due to GB 18030 s broad coverage The remaining locales are supported via substitutions defined in the locl GSUB feature Language and script tags must be specified Simplified Chinese = ZHS/hani Traditional Chinese = ZHT/hani (Taiwan) and ZHH/hani (Hong Kong) Japanese = JAN/kana Korean = KOR/hang Fully-functional prototype fonts have been built 2009 Adobe Systems Incorporated. All rights reserved. 11

Other Pan-CJK Font Implementations TrueType Collection (TTC) Single font file with multiple font instances Each supported locale has its own font instance Separate font instances can share common glyphs Application font menus advertise multiple font instances, one for each locale The locl GSUB feature is not necessary Two iphone fonts, STHeiti-Light.ttc and STHeiti-Medium.ttc, are Pan-CJK TTC fonts Composite Font A Composite Font is a recipe that references one or more Component Fonts A Composite Font that can specify language/script can theoretically be a Pan-CJK font A Composite Font can be used to overcome or work around the 64K glyph barrier 2009 Adobe Systems Incorporated. All rights reserved. 12

Pan-CJK Font Support in OSes & Applications For TTC, Mac OS X and Windows can generally handle such fonts Applications enumerate fonts differently, so extensive application testing is required Most TTCs to date have been single-locale Multiple-locale, specifically Pan-CJK, TTCs are relatively new For OpenType, InDesign CS3 and greater supports the locl GSUB feature Can be specified in character and paragraph tags 2009 Adobe Systems Incorporated. All rights reserved. 13

Unicode Coverage Issues Which CJK Unified Ideographs should be included? Minimal coverage IICore 9,810 CJK Unified Ideographs 9,706 URO, 42 Extension A, and 62 Extension B Intermediate coverage Common standards GB 18030, Hong Kong SCS-2008, JIS X 0213:2004 and KS X 1001:2004 Equivalent to URO, Extension A, partial Extension B, and one Extension C code point GB 18030 requires all URO and Extension A code points, plus six in Extension B Hong Kong SCS-2008 requires 1,712 Extension B code points, plus one in Extension C JIS X 0213:2004 requires 303 Extension B code points Maximum coverage All of them This obviously breaks the 64K glyph barrier that is inherent in today s font formats 2009 Adobe Systems Incorporated. All rights reserved. 14

Locale-specific Considerations Hangul Kana Specific to Korean 11,172 code points Specific to Japanese, but included in standards of China, Taiwan, Hong Kong, and Korea Accounts for 70% of Japanese text, so the glyph design must be good Requires vertical variants for the small versions and for the long vowel mark Vertical variants for punctuation and kana Some vertical variants are locale-specific 2009 Adobe Systems Incorporated. All rights reserved. 15

Pan-CJK Font Prototype Details A proof of concept OpenType font Makes use of the locl GSUB feature 44,000 glyphs 29,925 code points URO + Extension A + partial Extension B close to intermediate coverage Supplied by Changzhou SinoType The default locale is Simplified Chinese 11,267 locl GSUB feature substitutions for Traditional Chinese 8,106 locl GSUB feature substitutions for Japanese 5,312 locl GSUB feature substitutions for Korean Its glyphs have not been extensively checked An IICore subset version includes 15,770 glyphs minimal coverage 2009 Adobe Systems Incorporated. All rights reserved. 16

Demo InDesign + OpenType Pan-CJK font prototype Use of paragraph tags Use of character tags 2009 Adobe Systems Incorporated. All rights reserved. 17

Future Predictions Today, Pan-CJK fonts clearly require multiple glyphs for many code points It is difficult to argue this point due to locale-specific conventions In the future, cross-cultural unification efforts are possible Unicode may serve as the catalyst The Web is making the world smaller, and cross-cultural interaction is ever-increasing This is not likely during the current generation, but perhaps within 25 years 2009 Adobe Systems Incorporated. All rights reserved. 18

Further Reading & Resources CJKV Information Processing, Second Edition (O Reilly Media, 2009) http://oreilly.com/catalog/9780596514471/ OpenType Specification http://www.microsoft.com/typography/otspec/ The Unicode Consortium http://www.unicode.org/ 2009 Adobe Systems Incorporated. All rights reserved. 19

2009 Adobe Systems Incorporated. All rights reserved. 20