This document is a preliminary proposal to encode two characters into Unicode.

Similar documents
Communication through the language barrier in some particular circumstances by means of encoded localizable sentences

Henry looks at Edith and smiles, with some embarrassment. It s alright, I m not going to criticise. You are using a Private Use Area characters in

Privacy and Security in Online Social Networks Department of Computer Science and Engineering Indian Institute of Technology, Madras

Introduction to Programming Style

Hotmail Documentation Style Guide

ISO/IEC TR TECHNICAL REPORT. Software and systems engineering Life cycle management Guidelines for process description

Tutorial: Functions and Functional Abstraction. Nathaniel Osgood CMPT

Working with Track Changes: A Guide

The semicolon [ ; ] is a powerful mark of punctuation with three uses.

Heuristic Evaluation of [ Quest ]

Reply to L2/10-327: Comments on L2/10-280, Proposal to Add Variation Sequences... 1

Heuristic Evaluation of Math Out of the Box

Alfresco Voice Writing for the user interface

12/03/2017 Icd-10 codes for physical therapy acl reconstruction 12/04/2017 Comparative doses of lisinopril and benazepril 12/05/2017-Amazon music

The Unicode Standard Version 12.0 Core Specification

Using Outlook Live

Designing Posters TIDI Development Research Week

What is XHTML? XHTML is the language used to create and organize a web page:

Printing Envelopes in Microsoft Word

ONIX for Books Product Information Message. Application Note: Embedding HTML markup in ONIX 3.0 data elements

How to Write Engaging s

Artificial Intelligence Prof. Deepak Khemani Department of Computer Science and Engineering Indian Institute of Technology, Madras

1.1 Information representation

MISB ST STANDARD. 29 October Ancillary Text Metadata Sets. 1 Scope. 2 References. 3 Acronyms. 4 Definitions

9. MATHEMATICIANS ARE FOND OF COLLECTIONS

Objectives. Coding Standards. Why coding standards? Elements of Java Style. Understand motivation for coding standards

SMART CONTENT INBOUND TRAINING DAY WORKBOOK. Angela Hicks HubSpot Academy

ECC Style Guide. ECC Style Guide

(Refer Slide Time: 01:25)

Business letter spacing guide

BUSINESS WRITING. Business Writing. Business Writing. Business Writing. Business Writing NEAR EAST UNIVERSITY

GOOGLE APPS. GETTING STARTED Page 02 Prerequisites What You Will Learn. INTRODUCTION Page 03 What is Google? SETTING UP AN ACCOUNT Page 03 Gmail

9/17/2018. Source: etiquette-important. Source:

Introduction to Object-Oriented Modelling and UML

Introduction Building and Using Databases for historical research December 2012

Heuristic Evaluation of Covalence

(Refer Slide Time: 06:01)

XML 2 APPLICATION. Chapter SYS-ED/ COMPUTER EDUCATION TECHNIQUES, INC.

List Building Starter Course. Lesson 2. Writing Your Campaign. Sean Mize

HANDOUT: COMPUTER PARTS

[Draft] RDA Resource Description and Access

CPS122 Lecture: Course Intro; Introduction to Object-Orientation

Strong signs your website needs a professional redesign

Proofwriting Checklist

CTW in Dasher: Summary and results.

Researching New SIP Carriers

shortcut Tap into learning NOW! Visit for a complete list of Short Cuts. Your Short Cut to Knowledge

Mega International Commercial bank (Canada)

Building a website. Should you build your own website?

Information Extraction Techniques in Terrorism Surveillance

An Interesting Way to Combine Numbers

Custom Fields With Virtuemart 2. Simple Custom Fields. Creating a Custom Field Type

Concordia University. Engineering & Computer Science WRITING ABOUT NUMBERS BCEE 6961 GRADUATE SEMINAR IN BUILDING AND CIVIL ENGINEERING

This is the LYX Beamer Template

SOME TYPES AND USES OF DATA MODELS

TYPO GRA PHY THE ANATOMY OF TYPE A BRIEF HISTORY OF TYPOGRAPHY WHAT IS YOUR TYPE ACTUALLY SAYING? OPEN FONT DISCUSSION

WBJS Grammar Glossary Punctuation Section

PowerPoint Basics: Create a Photo Slide Show

Heuristic Evaluation of meetchewthere

Using web-based

MANAGING YOUR MAILBOX: TRIMMING AN OUT OF CONTROL MAILBOX

The name of our class will be Yo. Type that in where it says Class Name. Don t hit the OK button yet.

CFMG Training Modules Classified Ad Strategy Module

14.1 Encoding for different models of computation

New user introduction to Attend

15 July, Huffman Trees. Heaps

(Refer Slide Time: 00:01:53)

The Unicode Standard Version 11.0 Core Specification

A Tutorial on SCSI-3 Persistent Group Reservations. (Version 1.0) by Lee Duncan, SUSE Labs December, 2012

Data Structures and Algorithms Dr. Naveen Garg Department of Computer Science and Engineering Indian Institute of Technology, Delhi.

GETTING STARTED PAGE 2 Prerequisites You Will Be Able To. INTRODUCTION PAGE 3 What is ? Pros, Cons and Tips Text Speak Dictionary

Notebook Assignments

Business Description 5 Paragraph Essay, MLA Way

Best Practices for. Membership Renewals

Lesson 1 - Getting Started

Sets. Sets. Examples. 5 2 {2, 3, 5} 2, 3 2 {2, 3, 5} 1 /2 {2, 3, 5}

Exercise: Graphing and Least Squares Fitting in Quattro Pro

Attachment (noun): is a computer file (such as word or excel) which is sent together with an .

CS112 Lecture: Defining Instantiable Classes

Assignment 0. Nothing here to hand in

PowerPoint. presentation

Getting Started With Syntax October 15, 2015

Editorial Style. An Overview of Hofstra Law s Editorial Style and Best Practices for Writing for the Web. Office of Communications July 30, 2013

Effective Communication: Seven Cs

Firefighters Pensions (England) Scheme Advisory Board Paper 2. SAB communication and branding guidelines

XML 2 APPLICATION. Chapter SYS-ED/ COMPUTER EDUCATION TECHNIQUES, INC.

Access Intermediate

Week - 01 Lecture - 04 Downloading and installing Python

Functional Programming in Haskell Prof. Madhavan Mukund and S. P. Suresh Chennai Mathematical Institute

The last page of the internet? The importance of Preserving the Dynamic Aspects of the Internet

from Pavel Mihaylov and Dorothee Beermann Reviewed by Sc o t t Fa r r a r, University of Washington

Projective geometry and the extended Euclidean plane

Chapter 3 - Simple JavaScript - Programming Basics. Lesson 1 - JavaScript: What is it and what does it look like?

Distribution From: Suzanne L. Krupp Date: October 1, 1980 Subject: A Basic Video-Terminal Editor for Menu Applications

Systems and software engineering Vocabulary

Design First ITS Instructor Tool

1. What type of error produces incorrect results but does not prevent the program from running? a. syntax b. logic c. grammatical d.

Understanding and Exploring Memory Hierarchies

A Short Introduction to CATMA

The language of

Transcription:

A preliminary proposal to encode two base characters William J G Overington 19 October 2015 1. Introduction This document is a preliminary proposal to encode two characters into Unicode. The two characters are each intended to be used as a base character for an encoding space produced by using a sequence consisting of a Unicode base character followed by some Unicode tag characters. I am hoping that each of the two base characters will eventually become encoded into plane 14, though I am not asking the Unicode Technical Committee (hereinafter, UTC) to encode the two characters straight away. The reason for not asking the UTC to encode the two characters straight away is that in order to be applied each of the two characters needs to be followed by a sequence of tag characters: the encoding of the sequence of tag characters for each of the two base characters to be by applying information from an ISO standard, the ISO standard for each of the two base characters not being the same ISO standard: the two ISO standards do not exist at this time. The two supposed ISO standards need not refer to electronic communication in any way. One would be a plain paper list of words and code numbers. The other would be a plain paper list of preset sentences and code numbers. The two plain paper lists could in principle be applied in various ways, though my motivation in seeking for them to become produced is so that they could be applied in plain text electronic communication. A good result from discussion of this document by the UTC, from my viewpoint, would be for the UTC to agree to keep the matter in escrow so that if I can persuade ISO to encode a plain paper list of words and code numbers and a plain paper list of preset sentences and code numbers, then the UTC would at that time encode the two base characters into plane 14 so that the two plain paper lists could each be applied to produce a tagspace accessed by a plane 14 character. A tagspace is defined in section 2 of this document, on page 2 of this document. If the UTC were to decide that, or something approaching that, I would then be able to approach ISO with first draft proposals for the two plain paper lists, not comprehensive, more placeholder applications so as to try to get things started, saying to ISO that encoding into Unicode of the two base characters would become possible. The two placeholder plain paper first drafts could then be gradually altered, maybe completely altered, and augmented so as to be able to get things started, just like Unicode started small and has been extended over time. I opine that encoding uniquely into plain text is important if these two tagspaces are to become used to maximum capability. I have considered markup systems yet, in my opinion, these do not have the maximum flexibility and opportunity for widespread use as do tagspaces encoded directly into Unicode and the International Standard. Page 1 of 6 of document of 19 October 2015 by William J G Overington

2. The concept of a tagspace defined In this document, a tagspace is defined as follows. tagspace: An encoding space produced by using a sequence consisting of a Unicode base character followed by one or more Unicode tag characters. This document suggests the encoding of two Unicode base characters, each of which would be used to produce a tagspace. The two tagspaces thus produced would be separate and would each stand alone, yet could be used together in some circumstances. 3. The suggested PANLEX BASE CHARACTER character The first base character for a tagspace is PANLEX BASE CHARACTER and the tagspace would encode in a language-independent form each word that exists in a language. The format for encoding in the tagspace is not suggested in this document. However, in an explanation of the concept a sequence of tag digit characters has been used, sometimes a TAG DIGIT SEVEN character followed by seven tag digit characters, sometimes a TAG DIGIT ONE character followed by one tag digit character, sometimes a TAG DIGIT TWO character followed by two tag digit characters, and so on. Here is a suggested visible glyph for the character. The glyph would not typically be displayed in a software-assisted application scenario yet is included for completeness as it could be useful for a graceful fallback display in some circumstances. In relation to PanLex words, here is an example, the code number chosen is just for this example, it is not intended to go through to encoding. For example, consider the word apricot. Suppose that that is encoded as the number 72519430 in the PanLex listing. Here the 7 is because there are seven digits following the 7, the 0 is because it is the general word, and the 251943 is just chosen as a number to distinguish the word apricot from other words. The 251943 is just an example number chosen for this explanation. The use of the leading 7 is because more frequently used words could have lower numbers. For example, the word and could have the number, say, 12 and the word today could have the number, say, 245 and so on. Page 2 of 6 of document of 19 October 2015 by William J G Overington

Returning to apricot. There could be variations so as to disambiguate. For example, 72519430 apricot 72519431 apricot (fruit) 72519432 apricot (tree) 72519433 apricot (colour) 72519434 apricot (flavour) Suppose that the PANLEX BASE CHARACTER were to become encoded into Unicode. Then an apricot in the sense of an apricot tree would, just here continuing with the code number of this explanation, be encoded into plain text using the following nine characters. PANLEX BASE CHARACTER, TAG DIGIT SEVEN, TAG DIGIT TWO, TAG DIGIT FIVE, TAG DIGIT ONE, TAG DIGIT NINE, TAG DIGIT FOUR, TAG DIGIT THREE, TAG DIGIT TWO The commas are just for clarity in this description, they are not in the message. This seems a lot, yet this could be from a cascading menu system where the person sending the message would select the word apricot (tree) from a menu and the software would insert the nine characters into the message automatically. 4. The suggested LOCALIZABLE SENTENCE BASE CHARACTER character The second base character for a tagspace is LOCALIZABLE SENTENCE BASE CHARACTER and the tagspace would encode in a language-independent form a finite subset of the grammatically stand-alone complete whole sentences that can be expressed in a language. The format for encoding in the tagspace is not suggested in this document. However, in an explanation of the concept a sequence of five tag digit characters has been used. Here is a suggested visible glyph for the character. The glyph would not typically be displayed in a software-assisted application scenario yet is included for completeness as it could be useful for a graceful fallback display in some circumstances. Page 3 of 6 of document of 19 October 2015 by William J G Overington

5. Examples of how encoded items in the Localizable Sentence tagspace could be used A good selection of preset grammatically stand-alone complete whole sentences that can be expressed in a language would be encoded in a language-independent form. Please note that the sentences are here expressed in capitals as in an encoding standard. A localized version would use ordinary conventions as to the use of uppercase letters and lowercase letters in those languages where two cases of letters are normally used. These sentences could include everyday items such as GOOD DAY. and BEST REGARDS, to sentences such as IS THERE ANY INFORMATION ABOUT THE FOLLOWING PERSON PLEASE? in a situation where someone is seeking news of a relative or friend after a disaster in another country. An example of use. Is there any information about the following person please? Margaret Gattenford Here a localizable sentence and the name of a person are used together. The localizable sentence would become localized into the local language yet the name of the person would go through unaltered. 6. Examples of how encoded items in the PanLex tagspace and the Localizable Sentence tagspace could be used together This would require some additional items to become encoded into the Localizable Sentence tagspace, yet no additional items would need to become encoded into the PanLex tagspace. This would open up lots of possibilities for communicating through the language barrier. There would be some situations where a PanLex word would follow a localizable sentence. For example, the following localizable sentence. THE PERSON NEEDS THE FOLLOWING MEDICINE. That could be followed by a PanLex word. Also, there could be a general mechanism so as to be able to send many messages through the language barrier. I deliberately did not write that any message could be sent as there could be limits for some very complicated sentences: the scope of what is possible could hopefully become increased as research continues, yet there may perhaps always be some messages that could not be sent in this manner. Page 4 of 6 of document of 19 October 2015 by William J G Overington

The general mechanism would be somewhat more complicated to use, yet maybe software assistance could in the future make this straightforward to apply. For example, suppose that one wishes to send through the language barrier the following sentence. The apricot tree is beautiful. One would need to construct the sentence. There could be localizable sentences for constructing such a sentence. For example, the following. The subject noun of the sentence being constructed is as follows. That would be encoded as LOCALIZABLE SENTENCE BASE CHARACTER followed by a sequence of tag characters to represent the code for the localizable sentence. That localizable sentence would be followed by PANLEX BASE CHARACTER followed by a sequence of tag characters. LOCALIZABLE SENTENCE BASE CHARACTER, TAG DIGIT SIX, TAG DIGIT SEVEN, TAG DIGIT ZERO, TAG DIGIT ZERO, TAG DIGIT ONE PANLEX BASE CHARACTER, TAG DIGIT SEVEN, TAG DIGIT TWO, TAG DIGIT FIVE, TAG DIGIT ONE, TAG DIGIT NINE, TAG DIGIT FOUR, TAG DIGIT THREE, TAG DIGIT TWO That is a lot of characters to send the message that the subject noun of the sentence being constructed is an apricot tree, yet whereas localizable sentences are intended for a limited set of sentences, (some just for friendly chat; some for serious applications, such as, for example, seeking news of relatives and friends after a disaster) this combination has the potential to send a message involving any words. I mentioned a localizable sentence as follows. The subject noun of the sentence being constructed is as follows. There could be a number of similar sentences, such as, as examples, the following five sentences. The direct object noun of the sentence being constructed is as follows. The verb of the sentence being constructed is as follows. The verb is in the present tense. The verb is in the future tense. The day of the week is as follows. This technique is interesting because it puts the activity of localizing the sentence about the apricot tree into the target language, as the activity of the recipient of the message, who would typically be a native speaker of the language. Page 5 of 6 of document of 19 October 2015 by William J G Overington

Also the same message could be sent to more than one recipient, where each of the recipients may not necessarily localize the sentence into the same target language. 7. Conclusions This document suggests possibilities for the future. I ask that consideration please be given to the long-term usefulness of these possibilities in software-assisted communication through the language barrier. Page 6 of 6 of document of 19 October 2015 by William J G Overington