BiDi in the Wild. Challenges of the Unicode Bidirectional algorithm. Moriel Schottlender Software Engineer

Size: px
Start display at page:

Download "BiDi in the Wild. Challenges of the Unicode Bidirectional algorithm. Moriel Schottlender Software Engineer"

Transcription

1 BiDi in the Wild Challenges of the Unicode Bidirectional algorithm Moriel Schottlender Software Engineer

2

3

4 Wikipedia s Right-to-Left support

5 Right-to-Left Wikipedias ~260 Wikipedias in Left-to-Right ~17 Wikipedias in Right-to-Left

6 Right-to-Left Wikipedias ~260 Wikipedias in Left-to-Right ~17 Wikipedias in Right-to-Left Arabic Wikipedia ~1,000,000 users ~375,000 articles

7 Right-to-Left Wikipedias ~260 Wikipedias in Left-to-Right ~17 Wikipedias in Right-to-Left Arabic Wikipedia ~1,000,000 users ~375,000 articles Persian Wikipedia ~514,000 users ~460,000 articles

8 Right-to-Left Wikipedias ~260 Wikipedias in Left-to-Right ~17 Wikipedias in Right-to-Left Arabic Wikipedia ~1,000,000 users ~375,000 articles Persian Wikipedia ~514,000 users ~460,000 articles Hebrew Wikipedia ~277,000 users ~175,000 articles

9

10

11 Editing Right-to-Left Wikipedias

12 A brief history of Right-to-Left support online

13 Long long ago Computers mostly only knew Left to Right Supporting non-latin scripts required special fonts There was no real Right-to-Left support

14 Long long ago Computers mostly only knew Left to Right Supporting non-latin scripts required special fonts There was no real Right-to-Left support Solution:

15 Long long ago Computers mostly only knew Left to Right Supporting non-latin scripts required special fonts There was no real Right-to-Left support Solution: Writing backwards

16 Long long ago Computers mostly only knew Left to Right Supporting non-latin scripts required special fonts There was no real Right-to-Left support Solution: Writing backwards

17 Long long ago Computers mostly only knew Left to Right Supporting non-latin scripts required special fonts There was no real Right-to-Left support Solution: Writing backwards enod eb ot dah gnihtemos

18 Long long ago Computers mostly only knew Left to Right Supporting non-latin scripts required special fonts There was no real Right-to-Left support Solution: Writing backwards Something had to be done

19 Pre-BiDi Solution: Visual and Logical encoding Visual: שלום עולם Char order: Logical: שלום עולם Char order: 432 1

20 Pre-BiDi Solution: Visual and Logical encoding Visual: שלום עולם Char order: (Someone had to type this backwards!) Logical: שלום עולם Char order: 432 1

21 Pre-BiDi Solution: Visual and Logical encoding Visual: שלום עולם Char order: (Someone had to type this backwards!) Logical: שלום עולם Char order: 432 1

22 Unicode Bidirectional Algorithm

23 Unicode Bidirectional Algorithm If all text on a page is uniform (all RTL or all LTR) the ordering of the display text is unambiguous.

24 Unicode Bidirectional Algorithm If all text on a page is uniform (all RTL or all LTR) the ordering of the display text is unambiguous. RTL content can include digits (written LTR) RTL content can be mixed with LTR content

25 Unicode Bidirectional Algorithm If all text on a page is uniform (all RTL or all LTR) the ordering of the display text is unambiguous. RTL content can include digits (written LTR) RTL content can be mixed with LTR content Examples: Santa Clara ב Unicode Conference אני הולכת להרצות ב צריך להתקשר ל

26 Unicode Bidirectional Algorithm If all text on a page is uniform (all RTL or all LTR) the ordering of the display text is unambiguous. RTL content can include digits (written LTR) RTL content can be mixed with LTR content The Bidirectional Algorithm is meant to solve ambiguity in rendering order. Examples: Santa Clara ב Unicode Conference אני הולכת להרצות ב צריך להתקשר ל

27 Quick primer to BiDi entity types Strong Affect the directionality of entities around them Alphabet Weak Do not affect the directionality of entities around them Punctuation*, digits Neutral Take the directionality of the context they re in Space, newline, tab, etc

28 Numbers 123 עברית

29 Numbers LTR 123 עברית RTL

30 Numbers LTR 123 עברית RTL LTR (Whitespace is neutral) 23 1 עברית RTL

31 Numbers LTR 123 עברית RTL LTR (Whitespace is neutral) 23 1 עברית RTL עברית RTL

32 Text and numbers English Hebrew English

33 Text and numbers English Hebrew English English עברית English

34 Text and numbers Strong Weak Strong Weak Strong English Hebrew English English עברית English

35 Text and numbers Strong Weak Strong Weak Strong English Hebrew English English עברית English LTR

36 Text and numbers Strong Weak Strong Weak Strong English Hebrew English English עברית English LTR RTL

37 Text and numbers Strong Weak Strong Weak Strong English Hebrew English English עברית English LTR RTL LTR

38 The confusing issue of the parentheses

39 The confusing issue of the parentheses Demo

40 Parentheses

41 Parentheses (hello)

42 Parentheses (hello)

43 Parentheses (hello) ( )שלום

44 Parentheses (hello) ( )שלום

45 Parentheses

46 Parentheses Good luck with HTML

47 Parentheses Good luck with HTML Or math comparisons

48 <a href=" title="foo">bar</a> LTR

49 <a href=" title="foo">bar</a> LTR RTL <a href=" title="foo"> <שלום /a> LTR LTR

50 <a href=" title="foo">bar</a> LTR RTL <a href=" title="foo"> <שלום /a> LTR LTR RTL <a href=" title=" <אהלן"<שלום /a> LTR LTR

51 <a href=" title="foo">bar</a> LTR RTL <a href=" title="foo"> <שלום /a> LTR LTR RTL <a href=" title=" <אהלן"<שלום /a> LTR LTR

52 RTL [[! שמאל זו אני Moriel schottlender.jpg 250px: ]]קובץ

53 RTL [[! שמאל זו אני Moriel schottlender.jpg 250px: ]]קובץ

54 RTL [[! שמאל זו אני Moriel schottlender.jpg 250px: ]]קובץ RTL RTL LTR RTL [[! שמאל זו אני Moriel schottlender.jpg 250px: ]]קובץ

55 RTL [[! שמאל זו אני Moriel schottlender.jpg 250px: ]]קובץ RTL RTL LTR RTL [[! שמאל זו אני Moriel schottlender.jpg 250px: ]]קובץ

56 Your brain on BiDi Credit: U.S. Navy photo by Photographer s Mate 2nd Class Aaron Peterson. Public Domain.

57 The tale of an animated bitmap

58 fig.bmp

59 \u202epmb.gif fig.bmp

60 \u202epmb.gif fig.bmp #!/usr/bin/env python import shutil shutil.copy("animated.gif", u"\u202epmb.gif") Code by David Chan

61 \u202epmb.gif fig.bmp #!/usr/bin/env python import shutil shutil.copy("animated.gif", u"\u202epmb.gif") Code by David Chan

62 \u202epmb.gif fig.bmp #!/usr/bin/env python import shutil shutil.copy("animated.gif", u"\u202epmb.gif") Code by David Chan

63 \u202epmb.gif fig.bmp #!/usr/bin/env python import shutil shutil.copy("animated.gif", u"\u202epmb.gif") Code by David Chan

64 \u202epmb.gif fig.bmp #!/usr/bin/env python import shutil shutil.copy("animated.gif", u"\u202epmb.gif") Code by David Chan

65 \u202epmb.gif fig.bmp #!/usr/bin/env python import shutil shutil.copy("animated.gif", u"\u202epmb.gif") Code by David Chan

66 \u202epmb.gif fig.bmp #!/usr/bin/env python import shutil shutil.copy("animated.gif", u"\u202epmb.gif") Code by David Chan

67 Control characters Implicit directional formatting U+200E: LEFT-TO-RIGHT MARK (LRM) U+200F: RIGHT-TO-LEFT MARK (RLM) Explicit directional embedding U+202A: LEFT-TO-RIGHT EMBEDDING (LRE) U+202B: RIGHT-TO-LEFT EMBEDDING (RLE) U+202C: POP DIRECTIONAL FORMATTING (PDF) Explicit directional override U+202D: LEFT-TO-RIGHT OVERRIDE (LRO) U+202E: RIGHT-TO-LEFT OVERRIDE (RLO) Explicit directional isolate U+2066: LEFT-TO-RIGHT ISOLATE U+2067: RIGHT-TO-LEFT ISOLATE U+2068: FIRST STRONG ISOLATE U+2069: POP DIRECTIONAL ISOLATE

68 When BiDi is technically correct but practically wrong

69 Solution: Force Isolation

70 Solution: Force Isolation

71 Solution: Force Isolation

72 New topic created on [board name]: <bdi>[topic title]</bdi> Solution: Force Isolation

73 New topic created on [board name]: <bdi>[topic title]</bdi> Solution: Force Isolation

74 New topic created on [board name]: <bdi>[topic title]</bdi> Solution: Force Isolation

75 Dates

76 Dates

77 8:40, במאי IST 28 אל TLV (TLV to IST 28 May, 8:40) Dates

78 8:40, במאי IST 28 אל TLV RTL LTR RTL LTR (TLV to IST 28 May, 8:40) Dates

79 8:40, במאי IST 28 אל TLV RTL LTR RTL LTR (TLV to IST 28 May, 8:40) Dates

80 LTR in RTL clients

81 LTR in RTL clients

82 LTR in RTL clients

83 LTR client LTR in RTL clients

84 RTL client LTR client LTR in RTL clients

85 RTL client LTR client LTR in RTL clients

86 RTL client 2 1 LTR in RTL clients 2 1 LTR client

87 RTL client 2 1 LTR in RTL clients 2 1 LTR client

88 RTL client Solution: Always define content directionality 1 LTR client

89 Applications implement BiDi inconsistently

90 Web Inconsistent implementation of BiDi (Facebook)

91 Web Maximum 2 Terms Inconsistent implementation of BiDi (Facebook)

92 Web Maximum 2 Terms Mobile Inconsistent implementation of BiDi (Facebook)

93 Web Maximum 2 Terms Terms 2 Maximum Mobile Inconsistent implementation of BiDi (Facebook)

94 Web Maximum 2 Terms Terms 2 Maximum Mobile Inconsistent implementation of BiDi (Facebook) BiDi not implemented???

95 desktop Inconsistent automatic detection of direction (Google Hangounts) mobile

96 desktop Inconsistent automatic detection of direction (Google Hangounts) mobile

97 desktop No auto-flip auto-flip Inconsistent automatic detection of direction (Google Hangounts) mobile auto-flip auto-flip

98 Even real life ignores BiDi (and Unicode) a lot

99

100

101 מסורת

102 ﺳﻧت מסורת

103

104 اﻟﺗﻘﻠﯾد

105 When BiDi itself is confusing

106 Numbers, math and phone numbers Numbers are rendered Left-to-Right even in (most) Right-to-Left contexts

107 Numbers, math and phone numbers Numbers are rendered Left-to-Right even in (most) Right-to-Left contexts LTR Phone number

108 Numbers, math and phone numbers Numbers are rendered Left-to-Right even in (most) Right-to-Left contexts LTR Phone number RTL מספר טלפון

109 Numbers, math and phone numbers Numbers are rendered Left-to-Right even in (most) Right-to-Left contexts LTR Phone number RTL מספר טלפון LTR Phone number

110 Numbers, math and phone numbers Numbers are rendered Left-to-Right even in (most) Right-to-Left contexts LTR Phone number LTR Phone number RTL RTL מספר טלפון Plus / Minus signs are weak מספר טלפון

111 Numbers, math and phone numbers Numbers are rendered Left-to-Right even in (most) Right-to-Left contexts LTR Phone number LTR Phone number RTL RTL מספר טלפון מספר טלפון Plus / Minus signs are weak LTR There are 1-2 things but 4-5 others

112 Numbers, math and phone numbers Numbers are rendered Left-to-Right even in (most) Right-to-Left contexts LTR Phone number LTR Phone number RTL RTL מספר טלפון Plus / Minus signs are weak מספר טלפון spaces LTR There are 1-2 things but 4-5 others

113 Numbers, math and phone numbers Numbers are rendered Left-to-Right even in (most) Right-to-Left contexts LTR Phone number LTR Phone number RTL RTL מספר טלפון מספר טלפון spaces Plus / Minus signs are weak LTR There are 1-2 things but 4-5 others RTL אחרים 5-4 דברים אבל 1-2 יש spaces (flipped)

114

115

116

117

118

119 Printed abroad Printed in Israel

120 Bonus: Emoticons

121 Emoticons LTR :) RTL (:

122 Emoticons LTR :) LTR :( RTL RTL (: ):

123 Emoticons LTR :) LTR :( RTL RTL (: ): LTR :D LTR :P RTL RTL D: P:

124 (Bonus fail) Web

125 (Bonus fail) Web Android: Google Drive Android: Google Slides

126 RTL Users Expect nothing good

127 RTL Users are used to their computer not quite cooperating

128

129

130

131 Now what?

132 Now what? Consistency in implementing the Bidirectional algorithm

133 Now what? Consistency in implementing the Bidirectional algorithm Standard in predicting directionality while typing

134 Now what? Consistency in implementing the Bidirectional algorithm Standard in predicting directionality while typing Improving isolation of numbers and dates

135 Now what? Consistency in implementing the Bidirectional algorithm Standard in predicting directionality while typing Improving isolation of numbers and dates Consistent punctuation within sentences (We solved jumping parentheses; let s solve periods, commas, and colons!)

136 Remember Parentheses... LTR MSchottlender (WMF)

137 Remember Parentheses... LTR MSchottlender (WMF) RTL (MSchottlender (WMF

138 Remember Parentheses... LTR MSchottlender (WMF) RTL (MSchottlender (WMF

139 Remember Parentheses... LTR MSchottlender (WMF) RTL (MSchottlender (WMF

140 Remember Parentheses... LTR MSchottlender (WMF) RTL (MSchottlender (WMF

141 Keep RTLing

142 Keep RTLing

143 Keep RTLing

Proposed Update Unicode Standard Annex #9

Proposed Update Unicode Standard Annex #9 1 of 52 1/30/2015 11:23 AM Technical Reports Proposed Update Unicode Standard Annex #9 Version Unicode 8.0.0 (draft 4) Editors Date 2015-01-07 This Version Previous Version Latest Version Latest Proposed

More information

Proposed Update Unicode Standard Annex #9

Proposed Update Unicode Standard Annex #9 1 of 52 Technical Reports Proposed Update Unicode Standard Annex #9 Version Unicode 6.3.0 (draft 12) Editors Mark Davis (markdavis@google.com), Aharon Lanin (aharon@google.com), and Andrew Glass (andrew.glass@microsoft.com)

More information

Proposed Update Unicode Standard Annex #9

Proposed Update Unicode Standard Annex #9 Technical Reports Proposed Update Unicode Standard Annex #9 Version Unicode 6.2.1 (draft 3) Editors Date 2012-10-26 This Version Previous Version Latest Version Latest Proposed Update Revision 28 Summary

More information

Basics of the Unicode BiDirectional Algorithm (UBDA)

Basics of the Unicode BiDirectional Algorithm (UBDA) Basics of the Unicode BiDirectional Algorithm (UBDA) The formatting system implied in (and used to typeset) the slides for the brief bidirectional text reading lesson assumes: Every character has a direction,

More information

UNICODE BIDIRECTIONAL ALGORITHM

UNICODE BIDIRECTIONAL ALGORITHM Technical Reports Proposed Update Unicode Standard Annex #9 UNICODE BIDIRECTIONAL ALGORITHM Version Unicode 11.0.0 (draft 1) Editors Mark Davis (markdavis@google.com), Aharon Lanin (aharon@google.com),

More information

Unicode Standard Annex #9

Unicode Standard Annex #9 http://www.unicode.org/reports/tr9/tr9-24.html 1 of 30 Technical Reports Unicode Standard Annex #9 Version Unicode 6..0 Editors Date This Version Previous Version Latest Version Latest Proposed Update

More information

Right to left (RTL) support status Lior Kaplan

Right to left (RTL) support status Lior Kaplan Right to left (RTL) support status 2012 Lior Kaplan 1 Have you seen it? 2 Have you seen it? $ export SAL_RTL_ENABLED=1 3 Have you seen it? $ export LC_ALL=he_IL.UTF-8 4 The basics

More information

Proper display of bidirectional structured text

Proper display of bidirectional structured text Proper display of bidirectional structured text Authors Aharon Lanin (Google) Gilead Almosnino (Microsoft) Lina Kemmel (IBM) Mati Allouche (former IBM) Mohamed Mohie (IBM) Tomer Mahlin (IBM) Basic terminology

More information

Bidirectional parenthesis algorithm

Bidirectional parenthesis algorithm Bidirectional parenthesis algorithm Ayman Aldahleh, Gilead Almosnino, Peter Constable, Andrew Glass, Laurentiu Iancu, Dwayne Robinson, Murray Sargent, Robert Steen 5 1. Introduction In its current form

More information

Extending Bidi Support on the Web. Richard Ishida, W3C Aharon Lanin, Google

Extending Bidi Support on the Web. Richard Ishida, W3C Aharon Lanin, Google Extending Bidi Support on the Web Richard Ishida, W3C Aharon Lanin, Google Bidi support on the Web Additional Requirements for Bidi in HTML Read the Working Draft at http://www.w3.org/international/ docs/html-bidi-requirements/

More information

Character Encodings. Fabian M. Suchanek

Character Encodings. Fabian M. Suchanek Character Encodings Fabian M. Suchanek 22 Semantic IE Reasoning Fact Extraction You are here Instance Extraction singer Entity Disambiguation singer Elvis Entity Recognition Source Selection and Preparation

More information

A Proposal for Bidi Isolates in Unicode

A Proposal for Bidi Isolates in Unicode A Proposal for Bidi Isolates in Unicode Aharon Lanin, Mark Davis, and Roozbeh Pournader July 24, 2012 Live document: http://goo.gl/k6qtv Document history: #heading=h.6rlabmox2h39 Abstract HTML/CSS recently

More information

Multilingual mathematical e-document processing

Multilingual mathematical e-document processing Multilingual mathematical e-document processing Azzeddine LAZREK University Cadi Ayyad, Faculty of Sciences Department of Computer Science Marrakech - Morocco lazrek@ucam.ac.ma http://www.ucam.ac.ma/fssm/rydarab

More information

CSS3 Text Extensions. 1 Summary. 2 Contents. Michel Suignard. Microsoft Corporation

CSS3 Text Extensions. 1 Summary. 2 Contents. Michel Suignard. Microsoft Corporation Michel Suignard Microsoft Corporation 1 Summary This document presents new text extensions considered for CSS3 (Cascading Style Sheet). The main topics presented are layout flow, text justification, baseline

More information

Blending Content for South Asian Language Pedagogy Part 2: South Asian Languages on the Internet

Blending Content for South Asian Language Pedagogy Part 2: South Asian Languages on the Internet Blending Content for South Asian Language Pedagogy Part 2: South Asian Languages on the Internet A. Sean Pue South Asia Language Resource Center Pre-SASLI Workshop 6/7/09 1 Objectives To understand how

More information

Proposal to encode ADLAM LETTER APOSTROPHE for ADLaM script

Proposal to encode ADLAM LETTER APOSTROPHE for ADLaM script 1 Proposal to encode ADLAM LETTER APOSTROPHE for ADLaM script Abdoulaye Barry, Ibrahima Barry, Peter Constable, Andrew Glass August 26, 2018 ADLaM script was added to Unicode in version 9.0 (2016) on the

More information

Request for Comments: 5893 Category: Standards Track Swedish Museum of Natural History August 2010

Request for Comments: 5893 Category: Standards Track Swedish Museum of Natural History August 2010 Internet Engineering Task Force (IETF) Request for Comments: 5893 Category: Standards Track ISSN: 2070-1721 H. Alvestrand, Ed. Google C. Karp Swedish Museum of Natural History August 2010 Right-to-Left

More information

The Unicode Standard Version 10.0 Core Specification

The Unicode Standard Version 10.0 Core Specification The Unicode Standard Version 10.0 Core Specification To learn about the latest version of the Unicode Standard, see http://www.unicode.org/versions/latest/. Many of the designations used by manufacturers

More information

IDN - what s up? Patrik Fältström

IDN - what s up? Patrik Fältström IDN - what s up? Patrik Fältström paf@cisco.com 1 Old stuff (what is IDNA) What is it? What implications do we get? IDNA uses Unicode 3.2 2 Protocol issues Old protocols can only handle a subset of US-

More information

Preliminary proposal for UBA v.2

Preliminary proposal for UBA v.2 Response to PRIs * 205 Proposed addition of AL MARK and LEVEL DIRECTION MARK * 188 Proposed Update UAX #9: Unicode Bidirectional Algorithm * 185 Extension of UBA for improved display of URL/IRIs L2/11-377

More information

Premier Literacy Tools

Premier Literacy Tools Premier Literacy Tools Tutorial Guide A step-by-step guide to the most popular tools in Premier Literacy Tools. Created by: Heather Harris, Special Education Coach Intern Table of Contents Talking Word

More information

IT101. Characters: from ASCII to Unicode

IT101. Characters: from ASCII to Unicode IT101 Characters: from ASCII to Unicode Java Primitives Note the char (character) primitive. How does it represent the alphabet letters? What is the difference between char and String? Does a String consist

More information

The Unicode Standard Version 6.0 Core Specification

The Unicode Standard Version 6.0 Core Specification The Unicode Standard Version 6.0 Core Specification To learn about the latest version of the Unicode Standard, see http://www.unicode.org/versions/latest/. Many of the designations used by manufacturers

More information

Location of Talk/Slides/Software/Demos

Location of Talk/Slides/Software/Demos Implementing Better Source Editing for Bidirectional HTML and XML in the Text Editor 35 th Internationalization and Unicode Conference October 18, 2011 Shunsuke Oshima Martin J. Dürst Aoyama Gakuin University,

More information

Two distinct code points: DECIMAL SEPARATOR and FULL STOP

Two distinct code points: DECIMAL SEPARATOR and FULL STOP Two distinct code points: DECIMAL SEPARATOR and FULL STOP Dario Schiavon, 207-09-08 Introduction Unicode, being an extension of ASCII, inherited a great historical mistake, namely the use of the same code

More information

Date:.. /. / 20.. Remas Language Schools. Name :. Class : Second Term 5th Primary 1 Computer Department

Date:.. /. / 20.. Remas Language Schools. Name :. Class : Second Term 5th Primary 1 Computer Department Name :. Class : Second Term 5th Primary 1 Computer Department Table of contents of the (Second term) Chapter 3: continue the PowerPoint: Lesson 8: View show Lesson 9: Slide to slide transitions Lesson

More information

Keyman, LANGIDs & Codepages

Keyman, LANGIDs & Codepages Keyman, LANGIDs & Codepages Interactions you may not expect Peter Constable SIL Non-Roman Script Initiative Copyright 2001 Peter Constable & SIL International In certain situations, Keyman 5 may appear

More information

PROGRAMMING FUNDAMENTALS

PROGRAMMING FUNDAMENTALS PROGRAMMING FUNDAMENTALS INTRODUCTION & THE WAY OF THE PROGRAM João Correia Lopes INESC TEC, FEUP 25 September 2018 FPRO/MIEIC/2018-19 25/09/2018 1 / 33 INTRODUCTION GOALS By the end of this class, the

More information

The Use of Unicode in MARC 21 Records. What is MARC?

The Use of Unicode in MARC 21 Records. What is MARC? # The Use of Unicode in MARC 21 Records Joan M. Aliprand Senior Analyst, RLG What is MARC? MAchine-Readable Cataloging MARC is an exchange format Focus on MARC 21 exchange format An implementation may

More information

Functional Programming in Haskell Prof. Madhavan Mukund and S. P. Suresh Chennai Mathematical Institute

Functional Programming in Haskell Prof. Madhavan Mukund and S. P. Suresh Chennai Mathematical Institute Functional Programming in Haskell Prof. Madhavan Mukund and S. P. Suresh Chennai Mathematical Institute Module # 02 Lecture - 03 Characters and Strings So, let us turn our attention to a data type we have

More information

Network Working Group. Microsoft Corporation November 30, Internationalized Resource Identifiers (IRIs) draft-duerst-iri-11. Status of this Memo

Network Working Group. Microsoft Corporation November 30, Internationalized Resource Identifiers (IRIs) draft-duerst-iri-11. Status of this Memo Network Working Group Internet-Draft Expires: May 31, 2005 M. Duerst W3C M. Suignard Microsoft Corporation November 30, 2004 Status of this Memo Internationalized Resource Identifiers (IRIs) draft-duerst-iri-11

More information

Tex with Unicode Characters

Tex with Unicode Characters Tex with Unicode Characters 7/10/18 Presented by: Yuefei Xiang Agenda ASCII Code Unicode Unicode in Tex Old Style Encoding -Inputenc, -ucs Morden Encoding -XeTeX -LuaTeX Unicode bi-direction in Tex -Emacs-AucTeX

More information

IDN Visual Security Deep Thinking. xisigr Feb,2019

IDN Visual Security Deep Thinking. xisigr Feb,2019 IDN Visual Security Deep Thinking xisigr Feb,2019 About me Security researcher of Tencent's Xuanwu Lab https://xlab.tencent.com Author of Web Front-End Hacker's Handbook https://www.web2hack.org/ xisigr@xeye

More information

Privacy and Security in Online Social Networks Department of Computer Science and Engineering Indian Institute of Technology, Madras

Privacy and Security in Online Social Networks Department of Computer Science and Engineering Indian Institute of Technology, Madras Privacy and Security in Online Social Networks Department of Computer Science and Engineering Indian Institute of Technology, Madras Lecture - 25 Tutorial 5: Analyzing text using Python NLTK Hi everyone,

More information

The Structure of the Web. Jim and Matthew

The Structure of the Web. Jim and Matthew The Structure of the Web Jim and Matthew Workshop Structure 1. 2. 3. 4. 5. 6. 7. What is a browser? HTML CSS Javascript LUNCH Clients and Servers (creating a live website) Build your Own Website Workshop

More information

19. Bulleted and Numbered Lists

19. Bulleted and Numbered Lists Kennesaw State University DigitalCommons@Kennesaw State University Sexy Technical Communications Open Educational Resources 3-1-2016 19. Bulleted and Numbered Lists David McMurray Follow this and additional

More information

SAPGUI for Windows - I18N User s Guide

SAPGUI for Windows - I18N User s Guide Page 1 of 30 SAPGUI for Windows - I18N User s Guide Introduction This guide is intended for the users of SAPGUI who logon to Unicode systems and those who logon to non-unicode systems whose code-page is

More information

The Internationalization Tag Set

The Internationalization Tag Set Richard Ishida 1 Richard Ishida 2 Richard Ishida 3 Richard Ishida 4 A schema (with a small 's') describes the structure of an XML document. Some formats in which h people write schemas include DTDs (Document

More information

LING 388: Computers and Language. Lecture 5

LING 388: Computers and Language. Lecture 5 LING 388: Computers and Language Lecture 5 Administrivia Homework 3 graded Quick Homework 4 out today I'll be away next two weeks (my apologies) Colton Flowers, a HLT student, will take you through Python

More information

Easy Steps to Convert Alignment of Joomla! Templates

Easy Steps to Convert Alignment of Joomla! Templates 2007 Easy Steps to Convert Alignment of Joomla! Templates From Left-to-Right to Right-to-Left For using Joomla or any other CMS themes with Middle Eastern languages such as Hebrew, Arabic, the template

More information

Non-English Web Pages In Dreamweaver MX

Non-English Web Pages In Dreamweaver MX Non-English Web Pages In Dreamweaver MX The following describes how to use multiple languages in a web page using Dreamweaver MX for Microsoft Windows. The reader is assumed to have basic computer usage

More information

LBSC 690: Information Technology Lecture 05 Structured data and databases

LBSC 690: Information Technology Lecture 05 Structured data and databases LBSC 690: Information Technology Lecture 05 Structured data and databases William Webber CIS, University of Maryland Spring semester, 2012 Interpreting bits "my" 13.5801 268 010011010110 3rd Feb, 2014

More information

UTF and Turkish. İstinye University. Representing Text

UTF and Turkish. İstinye University. Representing Text Representing Text Representation of text predates the use of computers for text Text representation was needed for communication equipment One particular commonly used communication equipment was teleprinter

More information

Representing Characters and Text

Representing Characters and Text Representing Characters and Text cs4: Computer Science Bootcamp Çetin Kaya Koç cetinkoc@ucsb.edu Çetin Kaya Koç http://koclab.org Winter 2018 1 / 28 Representing Text Representation of text predates the

More information

Regular Expressions. Todd Kelley CST8207 Todd Kelley 1

Regular Expressions. Todd Kelley CST8207 Todd Kelley 1 Regular Expressions Todd Kelley kelleyt@algonquincollege.com CST8207 Todd Kelley 1 POSIX character classes Some Regular Expression gotchas Regular Expression Resources Assignment 3 on Regular Expressions

More information

6. Internet: Content sharing

6. Internet: Content sharing e mail cmgroup@xyz.com tejas@xyz.com jyoti@xyz.com moz@xyz.com 6. Internet: Content sharing Aim: In this lesson, you will learn: How to use advanced e-mail features, such as sending to groups. How to share

More information

The Unicode Standard Version 11.0 Core Specification

The Unicode Standard Version 11.0 Core Specification The Unicode Standard Version 11.0 Core Specification To learn about the latest version of the Unicode Standard, see http://www.unicode.org/versions/latest/. Many of the designations used by manufacturers

More information

PDF and Accessibility

PDF and Accessibility PDF and Accessibility This is a guide to PDF, how it affects people trying to use assistive technology (A.T.) with it, and what can be done. It assumes familiarity with A.T., whether a screenreader or

More information

Representing Characters, Strings and Text

Representing Characters, Strings and Text Çetin Kaya Koç http://koclab.cs.ucsb.edu/teaching/cs192 koc@cs.ucsb.edu Çetin Kaya Koç http://koclab.cs.ucsb.edu Fall 2016 1 / 19 Representing and Processing Text Representation of text predates the use

More information

Programming in ROBOTC ROBOTC Rules

Programming in ROBOTC ROBOTC Rules Programming in ROBOTC ROBOTC Rules In this lesson, you will learn the basic rules for writing ROBOTC programs. ROBOTC is a text-based programming language Commands to the robot are first written as text

More information

Viewports. Peter-Paul Koch CSS Day, 4 June 2014

Viewports. Peter-Paul Koch   CSS Day, 4 June 2014 Viewports Peter-Paul Koch http://quirksmode.org http://twitter.com/ppk CSS Day, 4 June 2014 or: Why responsive design works Peter-Paul Koch http://quirksmode.org http://twitter.com/ppk CSS Day, 4 June

More information

This tutorial will teach you about operators. Operators are symbols that are used to represent an actions used in programming.

This tutorial will teach you about operators. Operators are symbols that are used to represent an actions used in programming. OPERATORS This tutorial will teach you about operators. s are symbols that are used to represent an actions used in programming. Here is the link to the tutorial on TouchDevelop: http://tdev.ly/qwausldq

More information

THE INTEGER DATA TYPES. Laura Marik Spring 2012 C++ Course Notes (Provided by Jason Minski)

THE INTEGER DATA TYPES. Laura Marik Spring 2012 C++ Course Notes (Provided by Jason Minski) THE INTEGER DATA TYPES STORAGE OF INTEGER TYPES IN MEMORY All data types are stored in binary in memory. The type that you give a value indicates to the machine what encoding to use to store the data in

More information

Most of the class will focus on if/else statements and the logical statements ("conditionals") that are used to build them. Then I'll go over a few

Most of the class will focus on if/else statements and the logical statements (conditionals) that are used to build them. Then I'll go over a few With notes! 1 Most of the class will focus on if/else statements and the logical statements ("conditionals") that are used to build them. Then I'll go over a few useful functions (some built into standard

More information

TOOLBOX MANUAL. File conversion and tools software

TOOLBOX MANUAL. File conversion and tools software Cavena Image Products AB TOOLBOX MANUAL ToolBox File conversion and tools software This document is Copyright 2007-2018 Cavena Image Products AB. Reproduction of any kind is not permitted without the written

More information

CSS. Text & Font Properties. Copyright DevelopIntelligence LLC

CSS. Text & Font Properties. Copyright DevelopIntelligence LLC CSS Text & Font Properties 1 text-indent - sets amount of indentation for first line of text value: length measurement inherit default: 0 applies to: block-level elements and table cells inherits: yes

More information

Strings, characters and character literals

Strings, characters and character literals Strings, characters and character literals Internally, computers only manipulate bits of data; every item of data input can be represented as a number encoded in base 2. However, when it comes to processing

More information

Bookmarks for PDF Output(Outline-Group)

Bookmarks for PDF Output(Outline-Group) Bookmarks for PDF Output(Outline-Group) The axf:outline-group groups bookmark items of PDF, and outputs them collectively. Value: Initial: empty string Applies to: block-level formatting objects

More information

It is written in plain language: no jargon, nor formality. Information gets across faster when it s written in words that our users actually use.

It is written in plain language: no jargon, nor formality. Information gets across faster when it s written in words that our users actually use. Web Style Guide A style guide for use for writing on Tufts Library Websites and LibGuides. Contents: 1. Web style guides for online content 2. LibGuides 2-specific style guide 3. Tisch s website-specific

More information

Recall that strings and tuples are immutable datatypes, while lists are mutable datatypes. What does this mean?

Recall that strings and tuples are immutable datatypes, while lists are mutable datatypes. What does this mean? 6.189 Day 4 Readings How To Think Like A Computer Scientist, chapters 7 and 8 6.01 Fall 2009 Course Notes page 27-29 ( Lists and Iterations over lists ; List Comprehensions is optional); sections 3.2-3.4

More information

ECMA-404. The JSON Data Interchange Syntax. 2 nd Edition / December Reference number ECMA-123:2009

ECMA-404. The JSON Data Interchange Syntax. 2 nd Edition / December Reference number ECMA-123:2009 ECMA-404 2 nd Edition / December 2017 The JSON Data Interchange Syntax Reference number ECMA-123:2009 Ecma International 2009 COPYRIGHT PROTECTED DOCUMENT Ecma International 2017 Contents Page 1 Scope...

More information

Quick.JS Documentation

Quick.JS Documentation Quick.JS Documentation Release v0.6.1-beta Michael Krause Jul 22, 2017 Contents 1 Installing and Setting Up 1 1.1 Installation................................................ 1 1.2 Setup...................................................

More information

Viewports. Peter-Paul Koch DevReach, 13 November 2017

Viewports. Peter-Paul Koch   DevReach, 13 November 2017 Viewports Peter-Paul Koch http://quirksmode.org http://twitter.com/ppk DevReach, 13 November 2017 or: Why responsive design works Peter-Paul Koch http://quirksmode.org http://twitter.com/ppk DevReach,

More information

Q1 / 15 Q2 / 10 Q3 / 10 Q4 / 20 Q5 / 15 Q6 / 20 Q7 / 10 TOTAL / 100

Q1 / 15 Q2 / 10 Q3 / 10 Q4 / 20 Q5 / 15 Q6 / 20 Q7 / 10 TOTAL / 100 Your Name SE463 Fall 2011 Final exam 17 December 2011, 12:30pm 3:00pm Instructor: Daniel M. Berry No aids allowed (i.e., closed book). Plan your time wisely. Answer all of the questions on this exam paper.

More information

JAVA.LANG.CHARACTER CLASS

JAVA.LANG.CHARACTER CLASS JAVA.LANG.CHARACTER CLASS http://www.tutorialspoint.com/java/lang/java_lang_character.htm Copyright tutorialspoint.com Introduction The java.lang.character class wraps a value of the primitive type char

More information

(Refer Slide Time: 01:40)

(Refer Slide Time: 01:40) Internet Technology Prof. Indranil Sengupta Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur Lecture No #25 Javascript Part I Today will be talking about a language

More information

Nastaleeq: A challenge accepted by Omega

Nastaleeq: A challenge accepted by Omega Nastaleeq: A challenge accepted by Omega Atif Gulzar, Shafiq ur Rahman Center for Research in Urdu Language Processing, National University of Computer and Emerging Sciences, Lahore, Pakistan atif dot

More information

AP CS P. Unit 2. Introduction to HTML and CSS

AP CS P. Unit 2. Introduction to HTML and CSS AP CS P. Unit 2. Introduction to HTML and CSS HTML (Hyper-Text Markup Language) uses a special set of instructions to define the structure and layout of a web document and specify how the document should

More information

Communication through the language barrier in some particular circumstances by means of encoded localizable sentences

Communication through the language barrier in some particular circumstances by means of encoded localizable sentences Communication through the language barrier in some particular circumstances by means of encoded localizable sentences William J G Overington 17 February 2014 This research document presents a system which

More information

Unicode Convertor Reference Manual

Unicode Convertor Reference Manual Unicode Convertor Reference Manual Kamban Software, Australia. All rights reserved. Kamban Software is a Registered Trade Mark. Registered in Australia, ABN: 18820534037 Introduction First let me thank

More information

Ch.2: Loops and lists

Ch.2: Loops and lists Ch.2: Loops and lists Joakim Sundnes 1,2 Hans Petter Langtangen 1,2 Simula Research Laboratory 1 University of Oslo, Dept. of Informatics 2 Aug 29, 2018 Plan for 28 August Short quiz on topics from last

More information

Variables, Functions and String Formatting

Variables, Functions and String Formatting Variables, Functions and String Formatting Code Examples HW 2-1, 2-2 Logical Expressions Comparison Operators a == b Comparison operators compare the right-hand side and the lefthand side and return True

More information

CSCI 1100L: Topics in Computing Lab Lab 11: Programming with Scratch

CSCI 1100L: Topics in Computing Lab Lab 11: Programming with Scratch CSCI 1100L: Topics in Computing Lab Lab 11: Programming with Scratch Purpose: We will take a look at programming this week using a language called Scratch. Scratch is a programming language that was developed

More information

Table of Contents. Installation Global Office Mini-Tutorial Additional Information... 12

Table of Contents. Installation Global Office Mini-Tutorial Additional Information... 12 TM Table of Contents Installation... 1 Global Office Mini-Tutorial... 5 Additional Information... 12 Installing Global Suite The Global Suite installation program installs both Global Office and Global

More information

Unicode definition list

Unicode definition list abstract character D3 3.3 2 abstract character sequence D4 3.3 2 accent mark alphabet alphabetic property 4.10 2 alphabetic sorting annotation ANSI Arabic digit 1 Arabic-Indic digit 3.12 1 ASCII assigned

More information

(Photos and Instructions Based on Microsoft Outlook 2007, Gmail, Yahoo! Mail, and Hotmail)

(Photos and Instructions Based on Microsoft Outlook 2007, Gmail, Yahoo! Mail, and Hotmail) Specific instructions on how to compose a professional e-mail using send and reply options, basic e-mail components, appropriate wording, content, tone, and examples of what not to do. (Photos and Instructions

More information

Source coding and compression

Source coding and compression Computer Mathematics Week 5 Source coding and compression College of Information Science and Engineering Ritsumeikan University last week binary representations of signed numbers sign-magnitude, biased

More information

Slide Set 2. for ENCM 335 in Fall Steve Norman, PhD, PEng

Slide Set 2. for ENCM 335 in Fall Steve Norman, PhD, PEng Slide Set 2 for ENCM 335 in Fall 2018 Steve Norman, PhD, PEng Electrical & Computer Engineering Schulich School of Engineering University of Calgary September 2018 ENCM 335 Fall 2018 Slide Set 2 slide

More information

TOPIC 2 INTRODUCTION TO JAVA AND DR JAVA

TOPIC 2 INTRODUCTION TO JAVA AND DR JAVA 1 TOPIC 2 INTRODUCTION TO JAVA AND DR JAVA Notes adapted from Introduction to Computing and Programming with Java: A Multimedia Approach by M. Guzdial and B. Ericson, and instructor materials prepared

More information

Lecture 25: Internationalization. UI Hall of Fame or Shame? Today s Topics. Internationalization Design challenges Implementation techniques

Lecture 25: Internationalization. UI Hall of Fame or Shame? Today s Topics. Internationalization Design challenges Implementation techniques Lecture 25: Internationalization Spring 2008 6.831 User Interface Design and Implementation 1 UI Hall of Fame or Shame? Our Hall of Fame or Shame candidate for the day is this interface for choosing how

More information

USER GUIDE MADCAP FLARE Language Support

USER GUIDE MADCAP FLARE Language Support USER GUIDE MADCAP FLARE 2018 Language Support Copyright 2018 MadCap Software. All rights reserved. Information in this document is subject to change without notice. The software described in this document

More information

Transliteration of Tamil and Other Indic Scripts. Ram Viswanadha Unicode Software Engineer IBM Globalization Center of Competency, California, USA

Transliteration of Tamil and Other Indic Scripts. Ram Viswanadha Unicode Software Engineer IBM Globalization Center of Competency, California, USA Transliteration of Tamil and Other Indic Scripts Ram Viswanadha Unicode Software Engineer IBM Globalization Center of Competency, California, USA Main points of Powerpoint presentation This talk gives

More information

CS112 Lecture: Variables, Expressions, Computation, Constants, Numeric Input-Output

CS112 Lecture: Variables, Expressions, Computation, Constants, Numeric Input-Output CS112 Lecture: Variables, Expressions, Computation, Constants, Numeric Input-Output Last revised January 12, 2006 Objectives: 1. To introduce arithmetic operators and expressions 2. To introduce variables

More information

An Introduction to Python (TEJ3M & TEJ4M)

An Introduction to Python (TEJ3M & TEJ4M) An Introduction to Python (TEJ3M & TEJ4M) What is a Programming Language? A high-level language is a programming language that enables a programmer to write programs that are more or less independent of

More information

CS102 Unit 2. Sets and Mathematical Formalism Programming Languages and Simple Program Execution

CS102 Unit 2. Sets and Mathematical Formalism Programming Languages and Simple Program Execution 1 CS102 Unit 2 Sets and Mathematical Formalism Programming Languages and Simple Program Execution 2 Review Show how "Hi!\n" would be stored in the memory below Use decimal to represent each byte Remember

More information

Introduction 1. Chapter 1

Introduction 1. Chapter 1 This PDF file is an excerpt from The Unicode Standard, Version 5.2, issued and published by the Unicode Consortium. The PDF files have not been modified to reflect the corrections found on the Updates

More information

3Using and Writing. Functions. Understanding Functions 41. In this chapter, I ll explain what functions are and how to use them.

3Using and Writing. Functions. Understanding Functions 41. In this chapter, I ll explain what functions are and how to use them. 3Using and Writing Functions Understanding Functions 41 Using Methods 42 Writing Custom Functions 46 Understanding Modular Functions 49 Making a Function Modular 50 Making a Function Return a Value 59

More information

Address Internationalization Technical Perspective. Universal Acceptance

Address Internationalization Technical Perspective. Universal Acceptance Email Address Internationalization Technical Perspective Universal Acceptance Warm-up Exercise Each of the 3 groups below contain lists of Top Level Domains (TLDs) that are valid (approved and delegated

More information

Rendering in Dzongkha

Rendering in Dzongkha Rendering in Dzongkha Pema Geyleg Department of Information Technology pema.geyleg@gmail.com Abstract The basic layout engine for Dzongkha script was created with the help of Mr. Karunakar. Here the layout

More information

Search Engines. Information Retrieval in Practice

Search Engines. Information Retrieval in Practice Search Engines Information Retrieval in Practice All slides Addison Wesley, 2008 Web Crawler Finds and downloads web pages automatically provides the collection for searching Web is huge and constantly

More information

Setting Your Site Preferences

Setting Your Site Preferences Setting Your Site Preferences Created on October 1, 2016 This Page Intentionally Left Blank Setting Your Preferences ii 02-Jan-17 ii Table of Contents Introduction... 4 Opening the Setting or Preferences

More information

User Manual. Page-Turning ebook software for Mac and Windows platforms

User Manual. Page-Turning ebook software for Mac and Windows platforms User Manual Page-Turning ebook software for Mac and Windows platforms 3D Issue is a digital publishing software solution that converts your pdfs into online or offline digital, page-turning editions. Getting

More information

Evaluation Department, Danida. Layout guidelines for evaluation reports

Evaluation Department, Danida. Layout guidelines for evaluation reports Layout guidelines for evaluation reports January 2016 1 1. Introduction These guidelines provide guidance on how to improve the presentation of findings (Section 2), and the layout to be applied to the

More information

CPE 101, reusing/mod slides from a UW course (used by permission) Lecture 5: Input and Output (I/O)

CPE 101, reusing/mod slides from a UW course (used by permission) Lecture 5: Input and Output (I/O) CPE 101, reusing/mod slides from a UW course (used by permission) Lecture 5: Input and Output (I/O) Overview (5) Topics Output: printf Input: scanf Basic format codes More on initializing variables 2000

More information

Practical character sets

Practical character sets Practical character sets In MySQL, on the web, and everywhere Domas Mituzas MySQL @ Sun Microsystems Wikimedia Foundation It seems simple a b c d e f a ą b c č d e ę ė f а б ц д е ф פ ע ד צ ב א... ---...

More information

Getting Started Values, Expressions, and Statements CS GMU

Getting Started Values, Expressions, and Statements CS GMU Getting Started Values, Expressions, and Statements CS 112 @ GMU Topics where does code go? values and expressions variables and assignment 2 where does code go? we can use the interactive Python interpreter

More information

What does it take to implement VISTA outside of the US?

What does it take to implement VISTA outside of the US? What does it take to implement VISTA outside of the US? By Sam Habiel, Pharm.D. Director of Technology VISTA Expertise Network Sam.habiel@gmail.com @givethgoodmumps Outline Support for different Character

More information

IDN - the protocol. Patrik Fältström

IDN - the protocol. Patrik Fältström IDN - the protocol Patrik Fältström paf@cisco.com 1 In the beginning 3454 Preparation of Internationalized Strings ("stringprep"). P. Hoffman, M. Blanchet. December 2002. (Format: TXT=138684 bytes) (Status:

More information

L435/L555. Dept. of Linguistics, Indiana University Fall 2016

L435/L555. Dept. of Linguistics, Indiana University Fall 2016 for : for : L435/L555 Dept. of, Indiana University Fall 2016 1 / 12 What is? for : Decent definition from wikipedia: Computer programming... is a process that leads from an original formulation of a computing

More information

Publications Database

Publications Database Getting Started Guide Publications Database To w a r d s a S u s t a i n a b l e A s i a - P a c i f i c!1 Table of Contents Introduction 3 Conventions 3 Getting Started 4 Suggesting a Topic 11 Appendix

More information