Developping of Character Object Technology with Character Databases

Size: px
Start display at page:

Download "Developping of Character Object Technology with Character Databases"

Transcription

1 Developping of Character Object Technology with Character Databases 1) 2) MORIOKA Tomohiko Christian Wittern 1) ) ABSTRACT. The CHISE (CHaracter Information Service Environment) project is a character processing system which is based on the proposed character object model. This model is based on character property databases instead of coded character sets. Currently the system consists of two subsystems: XEmacs UTF-2000 and a prototype of Maps engine using Zope. XEmacs UTF-2000 is an extensible editor into which a character database has been embedded. Within XEmacs UTF-2000 each character is created as an object which is defined by a set of character-attributes. In order to achieve a higher expressive power, a topic map of characters based on the ISO Maps standard (ISO/IEC 13250) is under development. For the maintenance of this topic map, the prototype of a topic map engine has been developed based on the Zope object database server. In addition to that, a database of glyph expressions using IDS sequences for the more than Chinese characters contained in ISO/IEC :2000 has been developed. 1 2 CHISE (CHaracter Information Service Environment) SGML/XML UTF-2000 Maps [3] XEmacs [9] XEmacs UTF-2000 WWW Zope [10] XEmacs UTF-2000

2 1 / / / / / / 6 / / 1: XEmacs UTF-2000 XEmacs UTF-2000 XEmacs UTF-2000 UTF-2000 UTF XEmacs UTF-2000 UTF-2000 XEmacs-UTF ) XEmacs [9] XEmacs GNU Emacs [7] Emacs Lisp [6] Emacs Lisp GNU Emacs/XEmacs 2 WWW XEmacs UTF-2000 UTF-2000 XEmacs UTF-2000 GNU Emacs, XEmacs 30 bit 1 UTF-2000 id id id id XEmacs UTF-2000 define-char /a/ 1

3 coded-charset char-table 3 lazyloading UTF-2000 lazy-loading XEmacs UTF-2000 XEmacs-Mule XEmacs UTF define-char Emacs Lisp 1 XEmacs UTF-2000 UTF-2000 XEmacs UTF-2000 XEmacs UTF-2000 XEmacs-Mule define-char CHISE XEmacs UTF-2000 i386 Linux XEmacs- Mule 10 MB 5 40 MB Unicode Database XEmacs CNS UTF CDP (Chinese Document Processing) 27 MB CBETA (Chinese Buddhist Electronic Text Association) CHINA3 7 UTF-2000 UTF- Unicode [8] 2000 UTF-2000 morohashi-daikanwa The Unicode Standard [8] XEmacs UTF- Unicode =>ucs 2000 (lazy-loading) 2 XEmacs database Berkeley DB Debian GNU/Linux (sid) Berkeley DB Version 3 UTF-2000 lazy-loading dump TM 5800 Debian GNU/Linux 1970 (sid) 7 XEmacs UTF-2000 dump ISO/IEC : MB (strip 22 MB) [4] IDS lazy-loading 15 MB (strip 10 (Ideographic Description Sequence) MB) XEmacs mule IDC (Ideographic Description Char- 2 IDS lazy-loading XEmacs- S Lisp 10 MB (strip 6 MB) acters) Mule 5 MB XEmacs-Mule ideographic-structure Emacs Lisp code coded-charset Maps char-table XEmacs UTF-2000 char-table char-id-table CDP coded-charset (CBETA)

4 ASCII ( ) CDP / A B A B A B + C A B C + (((( ) ) + ( / )) )/ IDS IDS c CHISE IDS IDS XEmacs UTF-2000 ideographicstructure ideographic-structure 2: Ideographic Description Characters XML Maps CDP CBETA GPL CDP CBETA GT IDS IDS CDP Unicode a CDP ISO/IEC A, ISO/IEC CDP (Chinese Document Processing) [5] B, A B CDP Big5 CDP 14 ISO/IEC ,2 quail GNU (1) (2) (3) Emacs/XEmacs 8 CDP IDS 5 Zope Maps (3) IDC IDS (script) b CBETA (CBETA) UTF CBETA Big5 UTF Christian Wittern 1994 XML

5 1 Maps Character Maps (CTM) Maps 4 define-char 4: U+8AAA define-char Character Maps 5 Lisp 3: <occurrence> Maps topic SGML/XML (occurence) Maps ideographic-structure topic ( ) 6 topic topic <association> topic 2 Zope Maps topic map Zope (Zope Object Publishing Environment) Zope topic map Corporation Digital Creations 1) topic ( Web (occurrence)) Zope C 2) (associations) Python Zope Web 2000 [3] SGML [1] Zope HyTime [2] Architectural Forms DTD Maps (Document Type Definition) XML Zope Web XTM (XML Maps) Zope HTTP WebDAV ISO (amendment) XML-RPC Zope XML XEmacs

6 5: U+8AAA Character Map 7: Zope Maps UTF-2000 Map Maps topic map Maps Maps Query Language (TMQL) topic map Map topic map Basename Constraint (TMBC) topic Map topic map XEmacs UTF-2000 Map Maps 6: U+8AAA Character Map 7 Zope Maps Maps Maps Maps Maps CHISE Zope Map

7 Applications Maps, January Maps XEmacs UTF-2000 ISO/IEC 13250:2000. [4] International Organization for Standardization Zope Map (ISO). Information technology Universal Multiple-Octet Coded Character Set (UCS) Part 1: Architecture Basic Multilingual Plane (BMP), March and Map ISO/IEC :2000. [5] International Organization for Standard- PostgreSQL Map ization (ISO). Information technology XEmacs UTF-2000 Universal Multiple-Octet Coded Character Set (UCS) Part 2: Supplementary Planes, November Zope Map ISO/IEC : [6] Bil Lewis, Dan LaLiberte, and Richard Stallman. GNU Emacs Lisp Reference Manual. Free Software Foundation, 2.5 edition, May. for Emacs Version [7] Richard M. Stallman et al. GNU Emacs version ftp://ftp.gnu.org/gnu/emacs-21.2.tar.gz, March [8] The Unicode Consortium. The Unicode Standard, Version 3.0, February [9] XEmacs. [10] Zope. 8: Communication between XEmacs UTF- 2000, Zope and a relational database server Maps XEmacs UTF Map 6 7 [1] International Organization for Standardization (ISO). Information processing Text and office systems Standard Generalized Markup Language (SGML), ISO 8879:1986. [2] International Organization for Standardization (ISO). Information processing Text and office systems Hypermedia/Time-based Structuring Language (HyTime), ISO 10744:1997. [3] International Organization for Standardization (ISO). Information technology SGML

Universal Multiple-Octet Coded Character Set

Universal Multiple-Octet Coded Character Set Universal Multiple-Octet Coded Character Set International Organization for Standardization Organisation internationale de normalisation Me/dunarodna[ organizaci[ po standartizacii ISO/IEC JTC 1/SC 2/WG

More information

Unicode character. Unicode JIS X 0213 GB *2. Unicode character *3. John Mauchly Short Order Code character. Unicode Unicode ASCII.

Unicode character. Unicode JIS X 0213 GB *2. Unicode character *3. John Mauchly Short Order Code character. Unicode Unicode ASCII. Unicode character 2004 2 19 1 ( ) John Mauchly Short Order Code 1949 *1 1967 ASCII ASCII (ISO 2022 Mule ) (Unicode ISO/IEC 10646 ) (IBM NEC ) (e (s-moro@hanazono.ac.jp) *1 Fortran 1957 GT ) Unicode JIS

More information

On the Missing-Characters (Gaiji) of the Taisho Tripitaka Text Database Published by SAT

On the Missing-Characters (Gaiji) of the Taisho Tripitaka Text Database Published by SAT On the Missing-Characters (Gaiji) of the Taisho Tripitaka Text Database Published by SAT Shigeki Moro The Association for the Computerization of Buddhist Texts, Japan 0 ABSTRACT In March of 1998, the Association

More information

2011 Martin v. Löwis. Data-centric XML. Character Sets

2011 Martin v. Löwis. Data-centric XML. Character Sets Data-centric XML Character Sets Character Sets: Rationale Computer stores data in sequences of bytes each byte represents a value in range 0..255 Text data are intended to denote characters, not numbers

More information

2007 Martin v. Löwis. Data-centric XML. Character Sets

2007 Martin v. Löwis. Data-centric XML. Character Sets Data-centric XML Character Sets Character Sets: Rationale Computer stores data in sequences of bytes each byte represents a value in range 0..255 Text data are intended to denote characters, not numbers

More information

This manual describes utf8gen, a utility for converting Unicode hexadecimal code points into UTF-8 as printable characters for immediate viewing and

This manual describes utf8gen, a utility for converting Unicode hexadecimal code points into UTF-8 as printable characters for immediate viewing and utf8gen Paul Hardy This manual describes utf8gen, a utility for converting Unicode hexadecimal code points into UTF-8 as printable characters for immediate viewing and as byte sequences suitable for including

More information

ISO/IEC JTC 1/SC 2 N 3332/WG2 N 2057

ISO/IEC JTC 1/SC 2 N 3332/WG2 N 2057 ISO/IEC JTC 1/SC 2 N 3332/WG2 N 2057 Date: 1999-06-22 ISO/IEC JTC 1/SC 2 CODED CHARACTER SETS SECRETARIAT: JAPAN (JISC) DOC TYPE: TITLE: SOURCE: Other document National Body Comments on SC 2 N 3297, WD

More information

The Unicode Standard Version 11.0 Core Specification

The Unicode Standard Version 11.0 Core Specification The Unicode Standard Version 11.0 Core Specification To learn about the latest version of the Unicode Standard, see http://www.unicode.org/versions/latest/. Many of the designations used by manufacturers

More information

Part III: Survey of Internet technologies

Part III: Survey of Internet technologies Part III: Survey of Internet technologies Content (e.g., HTML) kinds of objects we re moving around? References (e.g, URLs) how to talk about something not in hand? Protocols (e.g., HTTP) how do things

More information

Extended Character Sets for UCAS Systems

Extended Character Sets for UCAS Systems Extended Character Sets for UCAS Systems Admissions Conference 2010 Mike Gwyer ASCII The American Standard Code for Information Interchange A character-encoding scheme based on the ordering of the English

More information

Recent Trends in Standardization of Japanese Character Codes

Recent Trends in Standardization of Japanese Character Codes Recent Trends in Standardization of Japanese Character Codes Taichi Kawabata Abstract Character encodings are a basic and fundamental layer of digital text that are necessary for exchanging information

More information

Category: Informational 1 April 2005

Category: Informational 1 April 2005 Network Working Group M. Crispin Request for Comments: 4042 Panda Programming Category: Informational 1 April 2005 Status of This Memo UTF-9 and UTF-18 Efficient Transformation Formats of Unicode This

More information

ISO/IEC JTC1/SC2/WG2 N 2490

ISO/IEC JTC1/SC2/WG2 N 2490 ISO/IEC JTC1/SC2/WG2 N 2490 Date: 2002-05-21 ISO/IEC JTC1/SC2/WG2 Coded Character Set Secretariat: Japan (JISC) Doc. Type: Disposition of comments Title: Proposed Disposition of comments on SC2 N 3585

More information

Introduction 1. Chapter 1

Introduction 1. Chapter 1 This PDF file is an excerpt from The Unicode Standard, Version 5.2, issued and published by the Unicode Consortium. The PDF files have not been modified to reflect the corrections found on the Updates

More information

WAP Binary XML Content Format Proposed Version 15-Aug-1999

WAP Binary XML Content Format Proposed Version 15-Aug-1999 WAP Binary XML Content Format Proposed Version 15-Aug-1999 Wireless Application Protocol Binary XML Content Format Specification Version 1.2 Disclaimer: This document is subject to change without notice.

More information

Chapter 10: Understanding the Standards

Chapter 10: Understanding the Standards Disclaimer: All words, pictures are adopted from Learning Web Design (3 rd eds.) by Jennifer Niederst Robbins, published by O Reilly 2007. Chapter 10: Understanding the Standards CSc2320 In this chapter

More information

Extensions for the programming language C to support new character data types VERSION FOR PDTR APPROVAL BALLOT. Contents

Extensions for the programming language C to support new character data types VERSION FOR PDTR APPROVAL BALLOT. Contents Extensions for the programming language C to support new character data types VERSION FOR PDTR APPROVAL BALLOT Contents 1 Introduction... 2 2 General... 3 2.1 Scope... 3 2.2 References... 3 3 The new typedefs...

More information

Free & Open Source Software: The Academic Future

Free & Open Source Software: The Academic Future Free & Open Source Software: The Academic Future Paul E. Johnson University of Kansas http://lark.cc.ku.edu/~pauljohn Presentation at Ukrainian National University of L'viv May 27, 2005

More information

This PDF file is an excerpt from The Unicode Standard, Version 4.0, issued by the Unicode Consortium and published by Addison-Wesley.

This PDF file is an excerpt from The Unicode Standard, Version 4.0, issued by the Unicode Consortium and published by Addison-Wesley. This PDF file is an excerpt from The Unicode Standard, Version 4.0, issued by the Unicode Consortium and published by Addison-Wesley. The material has been modified slightly for this online edition, however

More information

Unicode: What is it and how do I use it?

Unicode: What is it and how do I use it? Abstract: The rationale for Unicode and its design goals and detailed design principles are presented. The correspondence between Unicode and ISO/IEC 10646 is discussed, the scripts included or planned

More information

Graphical Notation for Topic Maps (GTM)

Graphical Notation for Topic Maps (GTM) Graphical Notation for Topic Maps (GTM) 2005.11.12 Jaeho Lee University of Seoul jaeho@uos.ac.kr 1 Outline 2 Motivation Requirements for GTM Goals, Scope, Constraints, and Issues Survey on existing approaches

More information

Tex with Unicode Characters

Tex with Unicode Characters Tex with Unicode Characters 7/10/18 Presented by: Yuefei Xiang Agenda ASCII Code Unicode Unicode in Tex Old Style Encoding -Inputenc, -ucs Morden Encoding -XeTeX -LuaTeX Unicode bi-direction in Tex -Emacs-AucTeX

More information

Network Working Group. September 24, XML Media Types draft-murata-xml-00.txt. Status of this Memo

Network Working Group. September 24, XML Media Types draft-murata-xml-00.txt. Status of this Memo Network Working Group Internet-Draft Expires: March 24, 2000 M. Murata Fuji Xerox Information Systems S. St.Laurent September 24, 1999 XML Media Types draft-murata-xml-00.txt Status of this Memo This document

More information

draft-ietf-idn-idna-02.txt Internationalizing Host Names In Applications (IDNA) Status of this Memo

draft-ietf-idn-idna-02.txt Internationalizing Host Names In Applications (IDNA) Status of this Memo Internet Draft draft-ietf-idn-idna-02.txt June 16, 2001 Expires in six months Patrik Faltstrom Cisco Paul Hoffman IMC & VPNC Status of this Memo Internationalizing Host Names In Applications (IDNA) This

More information

Java Multilingual Elementary Tool

Java Multilingual Elementary Tool November 28, 2004 Outline Designing Outline Multilingual system: refer to computer programs which permit user interaction with the computer in one or more languages A Java multilingual elementary tool

More information

The Adobe-CNS1-6 Character Collection

The Adobe-CNS1-6 Character Collection Adobe Enterprise & Developer Support Adobe Technical Note # bc The Adobe-CNS- Character Collection Introduction The purpose of this document is to define and describe the Adobe-CNS- character collection,

More information

MONTHLY TEST MAY 2017 QUESTION BANK FOR AVERAGE STUDENTS. Q.2 What is free software? How is it different from Open Source Software?

MONTHLY TEST MAY 2017 QUESTION BANK FOR AVERAGE STUDENTS. Q.2 What is free software? How is it different from Open Source Software? MONTHLY TEST MAY 2017 QUESTION BANK FOR AVERAGE STUDENTS Q.1. What is OSS? It refers to Open Source Software, which are modifiable, redistributable but may or may not be available free of cost. Source

More information

ISO/IEC INTERNATIONAL STANDARD. Information technology ASN.1 encoding rules: XML Encoding Rules (XER)

ISO/IEC INTERNATIONAL STANDARD. Information technology ASN.1 encoding rules: XML Encoding Rules (XER) INTERNATIONAL STANDARD ISO/IEC 8825-4 First edition 2002-12-15 Information technology ASN.1 encoding rules: XML Encoding Rules (XER) Technologies de l'information Règles de codage ASN.1: Règles de codage

More information

Request for Comments: 2482 Category: Informational Spyglass January Language Tagging in Unicode Plain Text. Status of this Memo

Request for Comments: 2482 Category: Informational Spyglass January Language Tagging in Unicode Plain Text. Status of this Memo Network Working Group Request for Comments: 2482 Category: Informational K. Whistler Sybase G. Adams Spyglass January 1999 Status of this Memo Language Tagging in Unicode Plain Text This memo provides

More information

Lloyd Rutledge, Lynda Hardman, Jacco van Ossenbruggen* and Dick C.A. Bulterman

Lloyd Rutledge, Lynda Hardman, Jacco van Ossenbruggen* and Dick C.A. Bulterman Lloyd Rutledge, Lynda Hardman, Jacco van Ossenbruggen* and Dick C.A. Bulterman CWI P.O. Box 94079 1090 GB Amsterdam, The Netherlands E-mail: {lloyd,lynda,dcab}@cwi.nl *Vrije Universiteit Dept. of Math.

More information

STANDARD ST.66 DECEMBER 2007 CHANGES

STANDARD ST.66 DECEMBER 2007 CHANGES Ref.: Standards - ST.66 Changes STANDARD ST.66 DECEMBER 2007 CHANGES Pages REFERENCES... 2 Editorial changes... 2 REQUIREMENTS OF THE STANDARD... 3 Paragraph 17, revised November 2007... 3 Paragraph 22,

More information

WAP Binary XML Content Format Document id WAP-192-WBXML Version 1.3 Approved Version 15 th May 2000

WAP Binary XML Content Format Document id WAP-192-WBXML Version 1.3 Approved Version 15 th May 2000 WAP Binary XML Content Format Document id WAP-192-WBXML-20000515 Version 1.3 Approved Version 15 th May 2000 This Document Document Identifier 192 Date 15 th May 2000 Subject: Version 1.3 WBXML Wireless

More information

ISO/IEC TR This is a preview - click here to buy the full publication TECHNICAL REPORT. First edition

ISO/IEC TR This is a preview - click here to buy the full publication TECHNICAL REPORT. First edition This is a preview - click here to buy the full publication TECHNICAL REPORT ISO/IEC TR 19769 First edition 2004-07-15 Information technology Programming languages, their environments and system software

More information

Obsoletes: 2070, 1980, 1942, 1867, 1866 Category: Informational June 2000

Obsoletes: 2070, 1980, 1942, 1867, 1866 Category: Informational June 2000 Network Working Group Request for Comments: 2854 Obsoletes: 2070, 1980, 1942, 1867, 1866 Category: Informational D. Connolly World Wide Web Consortium (W3C) L. Masinter AT&T June 2000 The text/html Media

More information

2009 Martin v. Löwis. Data-centric XML. XML Syntax

2009 Martin v. Löwis. Data-centric XML. XML Syntax Data-centric XML XML Syntax 2 What Is XML? Extensible Markup Language Derived from SGML (Standard Generalized Markup Language) Two goals: large-scale electronic publishing exchange of wide variety of data

More information

Request for Comments: 2277 BCP: 18 January 1998 Category: Best Current Practice. IETF Policy on Character Sets and Languages. Status of this Memo

Request for Comments: 2277 BCP: 18 January 1998 Category: Best Current Practice. IETF Policy on Character Sets and Languages. Status of this Memo Network Working Group H. Alvestrand Request for Comments: 2277 UNINETT BCP: 18 January 1998 Category: Best Current Practice Status of this Memo IETF Policy on Character Sets and Languages This document

More information

Australian Standard. Industrial automation systems and integration Open systems application integration framework

Australian Standard. Industrial automation systems and integration Open systems application integration framework AS ISO 15745.2 2004 ISO 15745-2:2003 AS ISO 15745.2 Australian Standard Industrial automation systems and integration Open systems application integration framework Part 2: Reference description for ISO

More information

Tutorial 1 Getting Started with HTML5. HTML, CSS, and Dynamic HTML 5 TH EDITION

Tutorial 1 Getting Started with HTML5. HTML, CSS, and Dynamic HTML 5 TH EDITION Tutorial 1 Getting Started with HTML5 HTML, CSS, and Dynamic HTML 5 TH EDITION Objectives Explore the history of the Internet, the Web, and HTML Compare the different versions of HTML Study the syntax

More information

Multilingual vi Clones: Past, Now and the Future

Multilingual vi Clones: Past, Now and the Future THE ADVANCED COMPUTING SYSTEMS ASSOCIATION The following paper was originally published in the Proceedings of the FREENIX Track: 1999 USENIX Annual Technical Conference Monterey, California, USA, June

More information

ATypI Hongkong Development of a Pan-CJK Font

ATypI Hongkong Development of a Pan-CJK Font ATypI Hongkong 2012 Development of a Pan-CJK Font What is a Pan-CJK Font? Pan (greek: ) means "all" or "involving all members" of a group Pan-CJK means a Unicode based font which supports different countries

More information

Introduction to Informatics

Introduction to Informatics Introduction to Informatics Lecture : Encoding Numbers (Part II) Readings until now Lecture notes Posted online @ http://informatics.indiana.edu/rocha/i The Nature of Information Technology Modeling the

More information

CS108 Software Systems: UNIX. Fall 2011

CS108 Software Systems: UNIX. Fall 2011 CS108 Software Systems: UNIX Fall 2011 CS108 Fall 2011 2 Course Info cs.utexas.edu/ edwardsj/teaching/2011fall/cs108 CS108 Fall 2011 3 Why Linux? Multi-user, multi-process operating system Open-source

More information

XSLT-process minor mode

XSLT-process minor mode XSLT-process minor mode for version 2.2 January 2003 by Ovidiu Predescu and Tony Addyman Copyright c 2000, 2001, 2002, 2003 Ovidiu Predescu. Copyright c 2002, 2003 Tony Addyman. All rights reserved. Distributed

More information

The unified ideograph U+5FF9 that has two sources (G and T3-2623) is shown below: (see ISO/IEC 10646:2003, p.477)

The unified ideograph U+5FF9 that has two sources (G and T3-2623) is shown below: (see ISO/IEC 10646:2003, p.477) Universal Multiple-Octet Coded Character Set UCS ISO/IEC JTC1/SC2/WG2 IRG N 1666 Date: 2010-6-15 Doc. Type: Member body contribution Title: Error report on U+225D6 AND U+2F89F Source: TCA and China Status:

More information

Unicode definition list

Unicode definition list abstract character D3 3.3 2 abstract character sequence D4 3.3 2 accent mark alphabet alphabetic property 4.10 2 alphabetic sorting annotation ANSI Arabic digit 1 Arabic-Indic digit 3.12 1 ASCII assigned

More information

Using Unicode with MIME

Using Unicode with MIME Network Working Group Request for Comments: 1641 Category: Experimental Using Unicode with MIME D. Goldsmith M. Davis July 1994 Status of this Memo This memo defines an Experimental Protocol for the Internet

More information

Specification Information Note

Specification Information Note Specification Information Note WAP-183_005-ProvCont-20020411-a Version 11-Apr-2002 for Wireless Application Protocol WAP-183-ProvCont-20010724-a WAP Provisioning Content Version 24-July-2001 A list of

More information

ä + ñ ISO/IEC JTC1/SC2/WG2 N

ä + ñ ISO/IEC JTC1/SC2/WG2 N ISO/IEC JTC1/SC2/WG2 N3727 Universal Multiple-Octet Coded Character Set International Organization for Standardization Organisation internationale de normalisation Международная организация по стандартизации

More information

XML Introduction 1. XML Stands for EXtensible Mark-up Language (XML). 2. SGML Electronic Publishing challenges -1986 3. HTML Web Presentation challenges -1991 4. XML Data Representation challenges -1996

More information

Contents. Topics. 01. WWW 02. WWW Documents 03. Web Service 04. Web Technologies. Management of Technology. C01-1. Documents

Contents. Topics. 01. WWW 02. WWW Documents 03. Web Service 04. Web Technologies. Management of Technology. C01-1. Documents Management of Technology Topics C01-1. Documents Code: 166125-01 Course: Management of Technology Period: Spring 2013 Professor: Sync Sangwon Lee, Ph. D 1 Contents 01. WWW 03. Web Service 04. Web Technologies

More information

INTERNATIONALIZATION IN GVIM

INTERNATIONALIZATION IN GVIM INTERNATIONALIZATION IN GVIM A PROJECT REPORT Submitted by Ms. Nisha Keshav Chaudhari Ms. Monali Eknath Chim In partial fulfillment for the award of the degree Of B. Tech Computer Engineering UNDER THE

More information

UniTerm Formats and Terminology Exchange

UniTerm Formats and Terminology Exchange Wolfgang Zenk UniTerm Formats and Terminology Exchange Abstract This article presents UniTerm, a typical representative of terminology management systems (TMS). The first part will highlight common characteristics

More information

ISO/IEC JTC1/SC2/WG2 N3787

ISO/IEC JTC1/SC2/WG2 N3787 Universal Multiple-Octet Coded Character Set UCS ISO/IEC JTC1/SC2/WG2 N3787 ISO/IEC JTC1/SC2/WG2 IRG N 1666 Date: 2010-3-25 Doc. Type: Member body contribution Title: Request for disunifying U+2F89F from

More information

GNU EPrints 2 Overview

GNU EPrints 2 Overview GNU EPrints 2 Overview Christopher Gutteridge 14th October 2002 Abstract An overview of GNU EPrints 2. EPrints is free software which creates a web based archive and database of scholarly output and is

More information

draft-hoffman-i18n-terms-02.txt July 18, 2001 Expires in six months Terminology Used in Internationalization in the IETF Status of this memo

draft-hoffman-i18n-terms-02.txt July 18, 2001 Expires in six months Terminology Used in Internationalization in the IETF Status of this memo Internet Draft draft-hoffman-i18n-terms-02.txt July 18, 2001 Expires in six months Paul Hoffman IMC & VPNC Status of this memo Terminology Used in Internationalization in the IETF This document is an Internet-Draft

More information

The HTTP protocol. Fulvio Corno, Dario Bonino. 08/10/09 http 1

The HTTP protocol. Fulvio Corno, Dario Bonino. 08/10/09 http 1 The HTTP protocol Fulvio Corno, Dario Bonino 08/10/09 http 1 What is HTTP? HTTP stands for Hypertext Transfer Protocol It is the network protocol used to delivery virtually all data over the WWW: Images

More information

AFP Support for TrueType/Open Type Fonts and Unicode

AFP Support for TrueType/Open Type Fonts and Unicode AFP Support for TrueType/Open Type Fonts and Unicode Reinhard Hohensee Distinguished Engineer October 24, 2003 Ricoh Topics What is Unicode? What are TrueType and OpenType fonts? Why have we extended the

More information

Editor s Concrete Syntax (ECS): a Profile of SGML for Editors

Editor s Concrete Syntax (ECS): a Profile of SGML for Editors Editor s Concrete Syntax (ECS): a Profile of SGML for Editors Topologi Technical Note. August 13, 2002 Rick Jelliffe SGML and XML Editing Concrete Syntax (ECS) This draft paper formalizes the lexical

More information

ISO/IEC INTERNATIONAL STANDARD. Information technology Multimedia content description interface Part 1: Systems

ISO/IEC INTERNATIONAL STANDARD. Information technology Multimedia content description interface Part 1: Systems INTERNATIONAL STANDARD ISO/IEC 15938-1 First edition 2002-07-01 Information technology Multimedia content description interface Part 1: Systems Technologies de l'information Interface de description du

More information

Administrative Notes February 9, 2017

Administrative Notes February 9, 2017 Administrative Notes February 9, 2017 Feb 10: Project proposal resubmission (optional) Feb 13: Art and Images reading quiz Feb 17: In the News call #2 Data Representation: Part 2 Text representation Colour

More information

PrepAwayExam. High-efficient Exam Materials are the best high pass-rate Exam Dumps

PrepAwayExam.   High-efficient Exam Materials are the best high pass-rate Exam Dumps PrepAwayExam http://www.prepawayexam.com/ High-efficient Exam Materials are the best high pass-rate Exam Dumps Exam : I10-003 Title : XML Master Professional Database Administrator Vendors : XML Master

More information

Alis Technologies. UTF-16, an encoding of ISO Status of this Memo

Alis Technologies. UTF-16, an encoding of ISO Status of this Memo Internet Draft December 13, 1998 Paul Hoffman Internet Mail Consortium Francois Yergeau Alis Technologies UTF-16, an encoding of ISO 10646 Status of this Memo This document

More information

XML Metadata Standards and Topic Maps

XML Metadata Standards and Topic Maps XML Metadata Standards and Topic Maps Erik Wilde 16.7.2001 XML Metadata Standards and Topic Maps 1 Outline what is XML? a syntax (not a data model!) what is the data model behind XML? XML Information Set

More information

Chapter 2: Open Source Concepts

Chapter 2: Open Source Concepts Chapter 2: Open Source Concepts Informatics Practices Class XII (CBSE Board) Revised as per CBSE Curriculum 2015 Visit www.ip4you.blogspot.com for more. Authored By:- Rajesh Kumar Mishra, PGT (Comp.Sc.)

More information

Yong Kyu Lee, Keum Suk Lee, Young Sik Hong Computer Engineering Dept. and Electronic Buddhist Text Institute (EBTI)

Yong Kyu Lee, Keum Suk Lee, Young Sik Hong Computer Engineering Dept. and Electronic Buddhist Text Institute (EBTI) The Hanguk Pulgyo Chonso and the Hangul Tripitaka (the Korean Ancient Buddhist Corpus and the Korean Translation of the Koryo Buddhist Canon) on the WWW Yong Kyu Lee, Keum Suk Lee, Young Sik Hong Computer

More information

ISO INTERNATIONAL STANDARD. Document management Electronic document file format for long-term preservation Part 1: Use of PDF 1.

ISO INTERNATIONAL STANDARD. Document management Electronic document file format for long-term preservation Part 1: Use of PDF 1. INTERNATIONAL STANDARD ISO 19005-1 First edition 2005-10-01 Document management Electronic document file format for long-term preservation Part 1: Use of PDF 1.4 (PDF/A-1) Gestion de documents Format de

More information

Representing Characters, Strings and Text

Representing Characters, Strings and Text Çetin Kaya Koç http://koclab.cs.ucsb.edu/teaching/cs192 koc@cs.ucsb.edu Çetin Kaya Koç http://koclab.cs.ucsb.edu Fall 2016 1 / 19 Representing and Processing Text Representation of text predates the use

More information

uptex Unicode version of ptex with CJK extensions

uptex Unicode version of ptex with CJK extensions uptex Unicode version of ptex with CJK extensions Takuji Tanaka uptex project Oct 26, 2013 Takuji Tanaka (uptex project) uptex Unicode version of ptex with CJK extensions Oct 26, 2013 1 / 42 Outline /

More information

From SGML to HTML and back. From SGML to HTML

From SGML to HTML and back. From SGML to HTML Surfing inside the Web From SGML to HTML and back Hans C. Arents Office Future International Services Atlas Park, Weiveldlaan 41 B. 32, B-1930 Zaventem, Belgium Tel: +32 (0)2 725 40 25 -Fax: +32 (0)2 725

More information

Practical character sets

Practical character sets Practical character sets In MySQL, on the web, and everywhere Domas Mituzas MySQL @ Sun Microsystems Wikimedia Foundation It seems simple a b c d e f a ą b c č d e ę ė f а б ц д е ф פ ע ד צ ב א... ---...

More information

Introduction Introduction to XML

Introduction Introduction to XML Introduction Introduction to XML Lecture "XML in Communication Systems" Chapter 1 Dr.-Ing. Jesper Zedlitz Research Group for Communication Systems Dept. of Computer Science Christian-Albrechts-University

More information

x ide xml Integrated Development Environment Specifications Document 1 Project Description 2 Specifi fications

x ide xml Integrated Development Environment Specifications Document 1 Project Description 2 Specifi fications x ide xml Integrated Development Environment Specifications Document Colin Hartnett (cphartne) 7 February 2003 1 Project Description There exist many integrated development environments that make large

More information

The Use of Unicode in MARC 21 Records. What is MARC?

The Use of Unicode in MARC 21 Records. What is MARC? # The Use of Unicode in MARC 21 Records Joan M. Aliprand Senior Analyst, RLG What is MARC? MAchine-Readable Cataloging MARC is an exchange format Focus on MARC 21 exchange format An implementation may

More information

TECkit version 2.0 A Text Encoding Conversion toolkit

TECkit version 2.0 A Text Encoding Conversion toolkit TECkit version 2.0 A Text Encoding Conversion toolkit Jonathan Kew SIL Non-Roman Script Initiative (NRSI) Abstract TECkit is a toolkit for encoding conversions. It offers a simple format for describing

More information

D16 Code sets, NLS and character conversion vs. DB2

D16 Code sets, NLS and character conversion vs. DB2 D16 Code sets, NLS and character conversion vs. DB2 Roland Schock ARS Computer und Consulting GmbH 05.10.2006 11:45 a.m. 12:45 p.m. Platform: DB2 for Linux, Unix, Windows Code sets and character conversion

More information

<draft-freed-charset-reg-02.txt> IANA Charset Registration Procedures. July Status of this Memo

<draft-freed-charset-reg-02.txt> IANA Charset Registration Procedures. July Status of this Memo HTTP/1.1 200 OK Date: Mon, 08 Apr 2002 23:58:19 GMT Server: Apache/1.3.20 (Unix) Last-Modified: Thu, 24 Jul 1997 17:22:00 GMT ETag: "2e9992-4021-33d78f38" Accept-Ranges: bytes Content-Length: 16417 Connection:

More information

Network Working Group Request for Comments: 3508 Category: Informational April H.323 Uniform Resource Locator (URL) Scheme Registration

Network Working Group Request for Comments: 3508 Category: Informational April H.323 Uniform Resource Locator (URL) Scheme Registration Network Working Group O. Levin Request for Comments: 3508 RADVISION Category: Informational April 2003 H.323 Uniform Resource Locator (URL) Scheme Registration Status of this Memo This memo provides information

More information

Unicode. Standard Alphanumeric Formats. Unicode Version 2.1 BCD ASCII EBCDIC

Unicode. Standard Alphanumeric Formats. Unicode Version 2.1 BCD ASCII EBCDIC Standard Alphanumeric Formats Unicode BCD ASCII EBCDIC Unicode Next slides 16-bit standard Developed by a consortia Intended to supercede older 7- and 8-bit codes Unicode Version 2.1 1998 Improves on version

More information

MRK260. Week Two. Graphic and Web Design

MRK260. Week Two. Graphic and Web Design MRK260 Week Two Graphic and Web Design This weeks topics BASIC HTML AND CSS MRK260 - Graphic & Web Design - Week Two 2 Lesson Summary What is HTML? Introduction to HTML Basics Introduction to CSS Introduction

More information

Creating an Oracle Database Using DBCA. Copyright 2009, Oracle. All rights reserved.

Creating an Oracle Database Using DBCA. Copyright 2009, Oracle. All rights reserved. Creating an Oracle Database Using DBCA Objectives After completing this lesson, you should be able to do the following: Create a database by using the Database Configuration Assistant (DBCA) Generate database

More information

Introduction to Linux Overview and Some History

Introduction to Linux Overview and Some History Introduction to Linux Overview and Some History Computational Science and Engineering North Carolina A&T State University Instructor: Dr. K. M. Flurchick Email: kmflurch@ncat.edu Operating Systems and

More information

XML APIs Testing Using Advance Data Driven Techniques (ADDT) Shakil Ahmad August 15, 2003

XML APIs Testing Using Advance Data Driven Techniques (ADDT) Shakil Ahmad August 15, 2003 XML APIs Testing Using Advance Data Driven Techniques (ADDT) Shakil Ahmad August 15, 2003 Table of Contents 1. INTRODUCTION... 1 2. TEST AUTOMATION... 2 2.1. Automation Methodology... 2 2.2. Automated

More information

ISO/IEC JTC 1/SC 2 N 3555 DATE: REPLACES: SC 2N3472

ISO/IEC JTC 1/SC 2 N 3555 DATE: REPLACES: SC 2N3472 ISO/IEC JTC 1/SC 2 N 3555 DATE: 2001-10-04 REPLACES: SC 2N3472 ISO/IEC JTC 1/SC 2 Coded Character Sets Secretariat: Japan (JISC) DOC. TYPE Business Plan TITLE SC 2 Business Plan (Period Covered: October

More information

Network Working Group. Updates: 5228 January 2008 Category: Standards Track

Network Working Group. Updates: 5228 January 2008 Category: Standards Track Network Working Group K. Homme Request for Comments: 5229 University of Oslo Updates: 5228 January 2008 Category: Standards Track Status of This Memo Sieve Email Filtering: Variables Extension This document

More information

Request for Comments: 3536 Category: Informational May Terminology Used in Internationalization in the IETF

Request for Comments: 3536 Category: Informational May Terminology Used in Internationalization in the IETF Network Working Group P. Hoffman Request for Comments: 3536 IMC & VPNC Category: Informational May 2003 Status of this Memo Terminology Used in Internationalization in the IETF This memo provides information

More information

XML Update. Royal Society of the Arts London, December 8, Jon Bosak Sun Microsystems

XML Update. Royal Society of the Arts London, December 8, Jon Bosak Sun Microsystems XML Update Royal Society of the Arts London, December 8, 1998 Jon Bosak Sun Microsystems XML Basics...A-1 The XML Concept...B-1 XML in Context...C-1 XML and Open Standards...D-1 XML Update XML Basics XML

More information

Network Working Group Request for Comments: Category: Best Current Practice January IANA Charset Registration Procedures

Network Working Group Request for Comments: Category: Best Current Practice January IANA Charset Registration Procedures Network Working Group Request for Comments: 2278 BCP: 19 Category: Best Current Practice N. Freed Innosoft J. Postel ISI January 1998 IANA Charset Registration Procedures Status of this Memo This document

More information

Journal of Digital Information, Vol 3, No 2 (2002)

Journal of Digital Information, Vol 3, No 2 (2002) Journal of Digital Information, Vol 3, No 2 (2002) Chinese Buddhist texts for the new Millenium - The Chinese Buddhist Electronic Text Association (CBETA) and its Digital Tripitaka Christian Wittern Institute

More information

BUDDHIST STONE SCRIPTURES FROM SHANDONG, CHINA

BUDDHIST STONE SCRIPTURES FROM SHANDONG, CHINA BUDDHIST STONE SCRIPTURES FROM SHANDONG, CHINA Heidelberg Academy of Sciences and Humanities Research Group Buddhist Stone Scriptures in China Hauptstraße 113 69117 Heidelberg Germany marnold@zo.uni-heidelberg.de

More information

Request for Comments: 2218 Category: Standards Track Sandia National Laboratory October A Common Schema for the Internet White Pages Service

Request for Comments: 2218 Category: Standards Track Sandia National Laboratory October A Common Schema for the Internet White Pages Service Network Working Group Request for Comments: 2218 Category: Standards Track T. Genovese Microsoft B. Jennings Sandia National Laboratory October 1997 A Common Schema for the Internet White Pages Service

More information

CODEV-NIC free registry software

CODEV-NIC free registry software CODEV-NIC free registry software Stéphane Bortzmeyer AFNIC (".fr" registry) bortzmeyer@nic.fr 2 March 2006 1 CODEV-NIC free registry software Permission is granted to copy, distribute and/or modify this

More information

How Emacs Evolves to Suit Your Needs p. 1 How Emacs Differs from Other Software p. 3 Software and the User p. 4 Emacs Vocabulary and Conventions p.

How Emacs Evolves to Suit Your Needs p. 1 How Emacs Differs from Other Software p. 3 Software and the User p. 4 Emacs Vocabulary and Conventions p. Introduction p. xxix How Emacs Evolves to Suit Your Needs p. 1 How Emacs Differs from Other Software p. 3 Software and the User p. 4 Emacs Vocabulary and Conventions p. 7 Key Conventions p. 9 Emacs and

More information

This PDF file is an excerpt from The Unicode Standard, Version 4.0, issued by the Unicode Consortium and published by Addison-Wesley.

This PDF file is an excerpt from The Unicode Standard, Version 4.0, issued by the Unicode Consortium and published by Addison-Wesley. This PDF file is an excerpt from The Unicode Standard, Version 4.0, issued by the Unicode Consortium and published by Addison-Wesley. The material has been modified slightly for this online edition, however

More information

This document is to be used together with N2285 and N2281.

This document is to be used together with N2285 and N2281. ISO/IEC JTC1/SC2/WG2 N2291 2000-09-25 Universal Multiple-Octet Coded Character Set International Organization for Standardization Organisation internationale de normalisation еждународная организация по

More information

Hong Kong Cantonese Data Retrieval in Multilingual Perspectives: The Case of a Cantonese-Dagaare-English e-lexicon

Hong Kong Cantonese Data Retrieval in Multilingual Perspectives: The Case of a Cantonese-Dagaare-English e-lexicon Hong Kong Cantonese Data Retrieval in Multilingual Perspectives: The Case of a Cantonese-Dagaare-English e-lexicon Sally Y.K. MOK Department of Linguistics, The University of Hong Kong Pokfulam, Hong Kong

More information

ISO/IEC INTERNATIONAL STANDARD. Information technology ECMAScript for XML (E4X) specification

ISO/IEC INTERNATIONAL STANDARD. Information technology ECMAScript for XML (E4X) specification INTERNATIONAL STANDARD ISO/IEC 22537 First edition 2006-02-15 Information technology ECMAScript for XML (E4X) specification Technologies de l'information ECMAScript pour spécification XML (E4X) Reference

More information

ISO/IEC JTC1/SC2/WG2. Universal Multiple-Octet Coded Character Set (UCS) - ISO/IEC Secretariat: ANSI

ISO/IEC JTC1/SC2/WG2. Universal Multiple-Octet Coded Character Set (UCS) - ISO/IEC Secretariat: ANSI 1 ISO/IEC JTC 1/SC 2/WG 2 N3246 DATE: 2007-04-20 ISO/IEC JTC 1/SC 2/WG 2 Universal Multiple-Octet Coded Character Set (UCS) - ISO/IEC 10646 Secretariat: ANSI TITLE: SOURCE: STATUS: ACTION: DISTRIBUTION:

More information

ISO International Organization for Standardization Organisation Internationale de Normalisation

ISO International Organization for Standardization Organisation Internationale de Normalisation ISO International Organization for Standardization Organisation Internationale de Normalisation ISO/IEC JTC 1/SC 2/WG 2 Universal Multiple-Octet Coded Character Set (UCS) ISO/IEC JTC 1/SC 2/WG 2 N2381R

More information

Introduction to XML. An Example XML Document. The following is a very simple XML document.

Introduction to XML. An Example XML Document. The following is a very simple XML document. Introduction to XML Extensible Markup Language (XML) was standardized in 1998 after 2 years of work. However, it developed out of SGML (Standard Generalized Markup Language), a product of the 1970s and

More information