Multilingual mathematical e-document processing

Similar documents
Arabic document composition with T E X

ISO/IEC JTC 1/SC 2/WG 2 N3086 PROPOSAL SUMMARY FORM TO ACCOMPANY SUBMISSIONS 1

Others Symbols, Additional characters proposed to Unicode. Azzeddine Lazrek

The Unicode Standard Version 11.0 Core Specification

The Unicode Standard Version 6.1 Core Specification

Introduction 1. Chapter 1

OpenType Font by Harsha Wijayawardhana UCSC

Nastaleeq: A challenge accepted by Omega

Transliteration of Tamil and Other Indic Scripts. Ram Viswanadha Unicode Software Engineer IBM Globalization Center of Competency, California, USA

Extensible Rendering for Complex Writing Systems

Title: Application to include Arabic alphabet shapes to Arabic 0600 Unicode character set

Using non-latin alphabets in Blaise

THE OUTLOOK FOR MATHEMATICS ON THE WEB

The Use of Unicode in MARC 21 Records. What is MARC?

How a Font Can Respect Basic Rules of Arabic Calligraphy

Two distinct code points: DECIMAL SEPARATOR and FULL STOP

OpenType Math Illuminated

Usage of MathML for paper and web publishing

What s new since TEX?

Unicode definition list

Proposed Update Unicode Standard Annex #9

The Unicode Standard Version 12.0 Core Specification

Unicode Standard Annex #9

COMN 1.1 Reference. Contents. COMN 1.1 Reference 1. Revision 1.1, by Theodore S. Hills, Copyright

Refinement of digitized documents through recognition of mathematical formulae

Rendering in Dzongkha

Interdisciplinary Journal of Best Practices in Global Development Final Manuscript Preparation Guidelines

Proposed Update. Unicode Standard Annex #11

Part III: Survey of Internet technologies

The making of Beautiful A method for designing type for Jordanian students

Reply to L2/10-327: Comments on L2/10-280, Proposal to Add Variation Sequences... 1

Request for encoding 1CF4 VEDIC TONE CANDRA ABOVE

CSS. Text & Font Properties. Copyright DevelopIntelligence LLC

Lumin Lumin Sans Lumin Sans Condensed Lumin Display

This proposal is limited to the addition and rearrangement of some of the Korean character part of ISO/IEC (UCS2).

Towards An Operational (La)TEX Package Supporting Optical Scaling of Dynamic Mathematical Symbols

Friendly Fonts for your Design

Tribunal. ewjduhiz tvnsgfq. Typotheque type specimen & OpenType feature specification. Please read before using the fonts.

Chapter 4: Computer Codes. In this chapter you will learn about:

Proposed Update Unicode Standard Annex #9

CONTEXTUAL SHAPE ANALYSIS OF NASTALIQ

2. Requester's name: Urdu and Regional Language Software Development Forum, Ministry of Science and Technology, Government of Pakistan

Study of Tools & Techniques for Accessing Mathematics by Partially Sighted / Visually Impaired persons. Akashdeep Bansal

Tex with Unicode Characters

Proposal to encode ADLAM LETTER APOSTROPHE for ADLaM script

Proposed Update Unicode Standard Annex #11 EAST ASIAN WIDTH

GUI-based Chinese Font Editing System Using Font Parameterization Technique

INTERNATIONALIZATION IN GVIM

Proposal to encode three Arabic characters for Arwi

Aesthetical Attributes for Segmenting Arabic Word

PLATYPUS FUNCTIONAL REQUIREMENTS V. 2.02

WME MathEdit. An initial report on the WME tool for creating & editing mathematics. by K. Cem Karadeniz

1. Question about how to choose a suitable program. 2. Contrasting and analyzing fairness of these two programs

The Unicode Standard Version 11.0 Core Specification

Location of Talk/Slides/Software/Demos

Communication through the language barrier in some particular circumstances by means of encoded localizable sentences

Blending Content for South Asian Language Pedagogy Part 2: South Asian Languages on the Internet

Formal Figure Formatting Checklist

Proposed Draft: Unicode Technical Report #53 UNICODE ARABIC MARK ORDERING ALGORITHM

An Online Repository of Mathematical Samples

INSTRUCTIONS FOR TYPESETTING MANUSCRIPTS USING L A TEX

General Structure 2. Chapter Architectural Context

Proposal to Encode the Ganda Currency Mark for Bengali in the BMP of the UCS

Example Paper in the ICROMA Format

MathML Presentation Markup for the Impatient

Proposed Update Unicode Standard Annex #9

D'Nealian manuscript handwriting practice worksheets.

Numeral Systems. -Numeral System -Positional systems -Decimal -Binary -Octal. Subjects:

Web Publishing Mathematics With MathML

BiDi in the Wild. Challenges of the Unicode Bidirectional algorithm. Moriel Schottlender Software Engineer

UNICODE BIDIRECTIONAL ALGORITHM

Written Language Production

General Structure 2. Chapter Architectural Context

A question of character Typographic tips for technophobes II

ISO/IEC JTC1/SC2/WG2 N3734 L2/09-144R

Detailed Contents for TEX Unbound: Strategies for Font, Graphics, and More

TYPOGRAPHY. The art of type

ISO/IEC JTC1/SC2/WG2 N2641

Typesetting in wxmaxima

GPU Font Rendering. Current State of the Art. Eric Lengyel, Ph.D. Terathon Software

Activity Report at SYSTRAN S.A.

PDF and Accessibility

Detailed Format Instructions for Authors of the SPB Encyclopedia

Chapter 2 Text Processing with the Command Line Interface

Proper display of bidirectional structured text

On-Line Recognition of Mathematical Expressions Using Automatic Rewriting Method

Proposals For Devanagari, Gurmukhi, And Gujarati Scripts Root Zone Label Generation Rules

Unicode in Education. Adil Allawi Technical Director

L2/ Proposal to encode archaic vowel signs O OO for Kannada. 1. Thanks. 2. Introduction

Proposed Update Unicode Standard Annex #34

Math Typesetting and Authoring

International Journal of Advanced Research in Computer Science and Software Engineering

2011 Martin v. Löwis. Data-centric XML. Character Sets

Henry looks at Edith and smiles, with some embarrassment. It s alright, I m not going to criticise. You are using a Private Use Area characters in

2007 Martin v. Löwis. Data-centric XML. Character Sets

ACM ICPC2009 Latin American Regionals 1. Problem A Pangram. File code name: pangram

Arabic and Multilingual Scripts Sorting and Analysis

Equation Editor Applet with TeX Output for the Web E M I L K A C H K Y M B A E V

1 See footnote 2, below.

A DIGITAL APPROACH TO HANDWRITTEN DOCUMENTS. B.I.T. - Bureau Ingénieur Tomasi

Transcription:

Multilingual mathematical e-document processing Azzeddine LAZREK University Cadi Ayyad, Faculty of Sciences Department of Computer Science Marrakech - Morocco lazrek@ucam.ac.ma http://www.ucam.ac.ma/fssm/rydarab

1 Features 2 Mathematical language Practices Arabic mathematical notation Internationalization & Localization 3 Problems Normalization Mathematical font Symbols encoding Typesetting system 4 MathML I18n 5

Features Arabic alphabet the direction of writing spreads out from right to left the shape of a letter depends on its position in the word 4 forms: isolated final median initial the cursivity of the writing joins characters Arabic letters represent only consonants or long vowels. Optional diacritical marks for short vowels can be used to annotate text or spell it out in full when desired some letters differ only by the presence, the number and the position of dots some letters differ only by some parts of glyphs Azzeddine LAZREK 3/25 Multilingual mathematical e-document processing

Features Arabic alphabet the direction of writing spreads out from right to left input order rendering order the shape of a letter depends on its position in the word 4 forms: isolated final median initial the cursivity of the writing joins characters Arabic letters represent only consonants or long vowels. Optional diacritical marks for short vowels can be used to annotate text or spell it out in full when desired some letters differ only by the presence, the number and the position of dots some letters differ only by some parts of glyphs Azzeddine LAZREK 3/25 Multilingual mathematical e-document processing

Features Arabic alphabet the direction of writing spreads out from right to left the shape of a letter depends on its position in the word 4 forms: isolated final median initial the cursivity of the writing joins characters Arabic letters represent only consonants or long vowels. Optional diacritical marks for short vowels can be used to annotate text or spell it out in full when desired some letters differ only by the presence, the number and the position of dots some letters differ only by some parts of glyphs Azzeddine LAZREK 3/25 Multilingual mathematical e-document processing

Features Arabic alphabet the direction of writing spreads out from right to left the shape of a letter depends on its position in the word 4 forms: isolated final median initial the cursivity of the writing joins characters Arabic letters represent only consonants or long vowels. Optional diacritical marks for short vowels can be used to annotate text or spell it out in full when desired some letters differ only by the presence, the number and the position of dots some letters differ only by some parts of glyphs Azzeddine LAZREK 3/25 Multilingual mathematical e-document processing

Features Arabic alphabet the direction of writing spreads out from right to left the shape of a letter depends on its position in the word 4 forms: isolated final median initial the cursivity of the writing joins characters Arabic letters represent only consonants or long vowels. Optional diacritical marks for short vowels can be used to annotate text or spell it out in full when desired some letters differ only by the presence, the number and the position of dots some letters differ only by some parts of glyphs Azzeddine LAZREK 3/25 Multilingual mathematical e-document processing

Features Arabic alphabet the direction of writing spreads out from right to left the shape of a letter depends on its position in the word 4 forms: isolated final median initial the cursivity of the writing joins characters Arabic letters represent only consonants or long vowels. Optional diacritical marks for short vowels can be used to annotate text or spell it out in full when desired some letters differ only by the presence, the number and the position of dots some letters differ only by some parts of glyphs Azzeddine LAZREK 3/25 Multilingual mathematical e-document processing

Features Arabic alphabet the direction of writing spreads out from right to left the shape of a letter depends on its position in the word 4 forms: isolated final median initial the cursivity of the writing joins characters Arabic letters represent only consonants or long vowels. Optional diacritical marks for short vowels can be used to annotate text or spell it out in full when desired some letters differ only by the presence, the number and the position of dots some letters differ only by some parts of glyphs Azzeddine LAZREK 3/25 Multilingual mathematical e-document processing

Features the punctuation marks present particularities of orientation and glyph the letters can be superposed through ligatures as characters the letters can be stretched in a curvilinear way through the kashida, which is a curved elongation of character some letters and words can also be superposed as non characters several calligraphic styles can be in use. Shapes of letters, ligatures, calligraphic rules,... vary according to these styles Azzeddine LAZREK 4/25 Multilingual mathematical e-document processing

Features the punctuation marks present particularities of orientation and glyph the letters can be superposed through ligatures as characters the letters can be stretched in a curvilinear way through the kashida, which is a curved elongation of character some letters and words can also be superposed as non characters several calligraphic styles can be in use. Shapes of letters, ligatures, calligraphic rules,... vary according to these styles Azzeddine LAZREK 4/25 Multilingual mathematical e-document processing

Features the punctuation marks present particularities of orientation and glyph the letters can be superposed through ligatures as characters the letters can be stretched in a curvilinear way through the kashida, which is a curved elongation of character some letters and words can also be superposed as non characters several calligraphic styles can be in use. Shapes of letters, ligatures, calligraphic rules,... vary according to these styles Azzeddine LAZREK 4/25 Multilingual mathematical e-document processing

Features the punctuation marks present particularities of orientation and glyph the letters can be superposed through ligatures as characters the letters can be stretched in a curvilinear way through the kashida, which is a curved elongation of character some letters and words can also be superposed as non characters several calligraphic styles can be in use. Shapes of letters, ligatures, calligraphic rules,... vary according to these styles Azzeddine LAZREK 4/25 Multilingual mathematical e-document processing

Features the punctuation marks present particularities of orientation and glyph the letters can be superposed through ligatures as characters the letters can be stretched in a curvilinear way through the kashida, which is a curved elongation of character some letters and words can also be superposed as non characters several calligraphic styles can be in use. Shapes of letters, ligatures, calligraphic rules,... vary according to these styles Azzeddine LAZREK 4/25 Multilingual mathematical e-document processing

Mathematical language Practices Arabic mathematical notation Internationalization & Localization Most electronic science, technical, engineering and medical documents contain mathematical expressions. Since the renaissance, the mathematical language knew a deep and constant evolution inside the western natural languages where mathematical knowledge was produced. Typesetting improvements followed, always in the same languages. That was not the same for the Arabic language. Throughout this period where mathematical development was going with lingual evolution, Arabic mathematical language kept out of the mathematical turning out. Countries who have developed mathematics got a mathematical language rooted in their natural language. Arabic mathematical language stood as it was centuries ago until that modern mathematics came. Then, mathematics was to be taught in foreign languages. Later, in some countries, the teaching of mathematics has been partially or totally arabizised. Azzeddine LAZREK 5/25 Multilingual mathematical e-document processing

Mathematical language Practices Arabic mathematical notation Internationalization & Localization There are many mathematical notations according to the regions: Latin mathematical presentation as in English or French: Symbols are then imported from one of these European languages writing, according to the dominant cultural influence. Symbolic writing is then running in the opposite direction of the natural language (text and expression are intermixed bidirectionally). Arabic mathematical presentation: Specific symbols are used and the writing follows the direction of the natural language handwriting which is from right to left. Arabic presentation uses symbols coming from the Arabic alphabet. Other symbols can be vertically reflected Latin symbols. The so-called Arabic or Arabic-Indic digits represent numbers. Azzeddine LAZREK 6/25 Multilingual mathematical e-document processing

Mathematical language Practices Arabic mathematical notation Internationalization & Localization Latin mathematical presentation as in English or French: Symbols are then imported from one of these European languages writing, according to the dominant cultural influence. Symbolic writing is then running in the opposite direction of the natural language (text and expression are intermixed bidirectionally). Arabic mathematical presentation: Specific symbols are used Azzeddine LAZREK 6/25 Multilingual mathematical e-document processing

Mathematical language Practices Arabic mathematical notation Internationalization & Localization Latin mathematical presentation as in English or French: Symbols are then imported from one of these European languages writing, according to the dominant cultural influence. Symbolic writing is then running in the opposite direction of the natural language (text and expression are intermixed bidirectionally). Arabic mathematical presentation: Specific symbols are used and the writing follows the direction of the natural language handwriting which is from right to left. Arabic presentation uses symbols coming from the Arabic alphabet. Other symbols can be vertically reflected Latin symbols. The so-called Arabic or Arabic-Indic digits represent numbers. Azzeddine LAZREK 6/25 Multilingual mathematical e-document processing

Mathematical language Practices Arabic mathematical notation Internationalization & Localization Many difficulties in the learning of mathematics are due to the inefficiency of the code of communication, the writing: Bi-directionality of writing: in handwriting, whenever a line begins with an Arabic sentence followed by a Latin mathematical expression, it is necessary to know the length of this expression a-priori. Otherwise, the symbolic formulation writing will overpass the sentence or at contrary it will stand far from. Reading symbols: if the pronunciation of a foreign abbreviation is done in conformity with the foreign language, then the corresponding Arabic word will never be in use. If the abbreviation is done in Arabic, then there won t be any relationship between what is written and what is said. Origins of symbols: the relationship between symbol s shape and symbolized concept s name vanishes completely when the western symbol is used into the. Cohabitation of two punctuation systems Azzeddinewith LAZREK identical functions. 7/25 Multilingual mathematical e-document processing

Mathematical language Practices Arabic mathematical notation Internationalization & Localization Many difficulties in the learning of mathematics are due to the inefficiency of the code of communication, the writing: Bi-directionality of writing: in handwriting, whenever a line begins with an Arabic sentence followed by a Latin mathematical expression, it is necessary to know the length of this expression a-priori. Otherwise, the symbolic formulation writing will overpass the sentence or at contrary it will stand far from. Reading symbols: if the pronunciation of a foreign abbreviation is done in conformity with the foreign language, then the corresponding Arabic word will never be in use. If the abbreviation is done in Arabic, then there won t be any relationship between what is written and what is said. Origins of symbols: the relationship between symbol s shape and symbolized concept s name vanishes completely when the western symbol is used into the. Cohabitation of two punctuation systems Azzeddinewith LAZREK identical functions. 7/25 Multilingual mathematical e-document processing

Mathematical language Practices Arabic mathematical notation Internationalization & Localization Many difficulties in the learning of mathematics are due to the inefficiency of the code of communication, the writing: Bi-directionality of writing: in handwriting, whenever a line begins with an Arabic sentence followed by a Latin mathematical expression, it is necessary to know the length of this expression a-priori. Otherwise, the symbolic formulation writing will overpass the sentence or at contrary it will stand far from. Reading symbols: if the pronunciation of a foreign abbreviation is done in conformity with the foreign language, then the corresponding Arabic word will never be in use. If the abbreviation is done in Arabic, then there won t be any relationship between what is written and what is said. Origins of symbols: the relationship between symbol s shape and symbolized concept s name vanishes completely when the western symbol is used into the. Cohabitation of two punctuation systems Azzeddinewith LAZREK identical functions. 7/25 Multilingual mathematical e-document processing

Mathematical language Practices Arabic mathematical notation Internationalization & Localization expressions spread from right to left alphabetic symbols used are with or without dots the alphabetic order uses in mathematics differs from text one no cursivity is applied between adjacent alphabetic symbols several styles are used to extend the amount of symbols there are two numeral systems Arabic abbreviations in use are composed through connected letters with cursivity but without diacritic signs; with or without dots Azzeddine LAZREK 8/25 Multilingual mathematical e-document processing

Mathematical language Practices Arabic mathematical notation Internationalization & Localization expressions spread from right to left alphabetic symbols used are with or without dots or as i and ı the alphabetic order uses in mathematics differs from text one no cursivity is applied between adjacent alphabetic symbols several styles are used to extend the amount of symbols there are two numeral systems Arabic abbreviations in use are composed through connected letters with cursivity but without diacritic signs; with or without dots Azzeddine LAZREK 8/25 Multilingual mathematical e-document processing

Mathematical language Practices Arabic mathematical notation Internationalization & Localization expressions spread from right to left alphabetic symbols used are with or without dots the alphabetic order uses in mathematics differs from text one instead of no cursivity is applied between adjacent alphabetic symbols several styles are used to extend the amount of symbols there are two numeral systems Arabic abbreviations in use are composed through connected letters with cursivity but without diacritic signs; with or without dots Azzeddine LAZREK 8/25 Multilingual mathematical e-document processing

Mathematical language Practices Arabic mathematical notation Internationalization & Localization expressions spread from right to left alphabetic symbols used are with or without dots the alphabetic order uses in mathematics differs from text one no cursivity is applied between adjacent alphabetic symbols for abc several styles are used to extend the amount of symbols there are two numeral systems Arabic abbreviations in use are composed through connected letters with cursivity but without diacritic signs; with or without dots Azzeddine LAZREK 8/25 Multilingual mathematical e-document processing

Mathematical language Practices Arabic mathematical notation Internationalization & Localization expressions spread from right to left alphabetic symbols used are with or without dots the alphabetic order uses in mathematics differs from text one no cursivity is applied between adjacent alphabetic symbols several styles are used to extend the amount of symbols there are two numeral systems Arabic abbreviations in use are composed through connected letters with cursivity but without diacritic signs; with or without dots Azzeddine LAZREK 8/25 Multilingual mathematical e-document processing

Mathematical language Practices Arabic mathematical notation Internationalization & Localization expressions spread from right to left alphabetic symbols used are with or without dots the alphabetic order uses in mathematics differs from text one no cursivity is applied between adjacent alphabetic symbols several styles are used to extend the amount of symbols there are two numeral systems and 0123456789 Arabic abbreviations in use are composed through connected letters with cursivity but without diacritic signs; with or without dots Azzeddine LAZREK 8/25 Multilingual mathematical e-document processing

Mathematical language Practices Arabic mathematical notation Internationalization & Localization expressions spread from right to left alphabetic symbols used are with or without dots the alphabetic order uses in mathematics differs from text one no cursivity is applied between adjacent alphabetic symbols several styles are used to extend the amount of symbols there are two numeral systems Arabic abbreviations in use are composed through connected letters with cursivity but without diacritic signs; with or without dots Azzeddine LAZREK 8/25 Multilingual mathematical e-document processing

Mathematical language Practices Arabic mathematical notation Internationalization & Localization superscript and subscript are on the left some symbols are mirrored some literal symbols are used there are two punctuation systems for b 2 b 2 These differences are an additional source of difficulties for mathematical documents composition tools in Arabic. Moreover, the use of a mathematical symbol can be localized. There can exist different local options to denote symbols. Both the previous symbols and can denote the operator sum. Azzeddine LAZREK 9/25 Multilingual mathematical e-document processing

Mathematical language Practices Arabic mathematical notation Internationalization & Localization superscript and subscript are on the left some symbols are mirrored with static symbol with variable-sized symbol some literal symbols are used there are two punctuation systems These differences are an additional source of difficulties for mathematical documents composition tools in Arabic. Moreover, the use of a mathematical symbol can be localized. There can exist different local options to denote symbols. Both the previous symbols and can denote the operator sum. Azzeddine LAZREK 9/25 Multilingual mathematical e-document processing

Mathematical language Practices Arabic mathematical notation Internationalization & Localization superscript and subscript are on the left some symbols are mirrored some literal symbols are used with a curvilinear Kashida instead of the linear one there are two punctuation systems These differences are an additional source of difficulties for mathematical documents composition tools in Arabic. Moreover, the use of a mathematical symbol can be localized. There can exist different local options to denote symbols. Both the previous symbols and can denote the operator sum. Azzeddine LAZREK 9/25 Multilingual mathematical e-document processing

Mathematical language Practices Arabic mathematical notation Internationalization & Localization superscript and subscript are on the left some symbols are mirrored some literal symbols are used there are two punctuation systems and 3, 14 These differences are an additional source of difficulties for mathematical documents composition tools in Arabic. Moreover, the use of a mathematical symbol can be localized. There can exist different local options to denote symbols. Both the previous symbols and can denote the operator sum. Azzeddine LAZREK 9/25 Multilingual mathematical e-document processing

Mathematical language Practices Arabic mathematical notation Internationalization & Localization superscript and subscript are on the left some symbols are mirrored some literal symbols are used there are two punctuation systems These differences are an additional source of difficulties for mathematical documents composition tools in Arabic. Moreover, the use of a mathematical symbol can be localized. There can exist different local options to denote symbols. Both the previous symbols and can denote the operator sum. Azzeddine LAZREK 9/25 Multilingual mathematical e-document processing

Mathematical language Practices Arabic mathematical notation Internationalization & Localization In a multilingual environment, allowing mathematical objects in Arabic to: Pedagogical goals be composed in a perfect precision with high quality be adapted according to the area be moderated according to the schooling level be personalized to handicaps be translated between notations be computed and interacting for learning Technical goals separate content from several presentations offer several formats allowing exchange between software be published on the web be platform independent Azzeddine LAZREK 10/25 Multilingual mathematical e-document processing

Mathematical language Practices Arabic mathematical notation Internationalization & Localization In a multilingual environment, allowing mathematical objects in Arabic to: Pedagogical goals be composed in a perfect precision with high quality be adapted according to the area be moderated according to the schooling level be personalized to handicaps be translated between notations be computed and interacting for learning Technical goals separate content from several presentations offer several formats allowing exchange between software be published on the web be platform independent Azzeddine LAZREK 10/25 Multilingual mathematical e-document processing

Mathematical language Practices Arabic mathematical notation Internationalization & Localization In a multilingual environment, allowing mathematical objects in Arabic to: Pedagogical goals be composed in a perfect precision with high quality be adapted according to the area be moderated according to the schooling level be personalized to handicaps be translated between notations be computed and interacting for learning Technical goals separate content from several presentations offer several formats allowing exchange between software be published on the web be platform independent Intend to encode mathematical expressions human legible and hand writable with preserving all information. Azzeddine LAZREK 10/25 Multilingual mathematical e-document processing

Image based method Quality: poor typographical quality Problems Normalization Mathematical font Symbols encoding Typesetting system Size: big size files for storing or downloading Reuse: mathematical expressions can t be reedited Structuring: mathematical information is not available for searching, indexing,... Portability: mathematical expressions can t be processed by a computer algebra system nor a translation system Needs Normalization Mathematical font Symbols encoding Typesetting system Azzeddine LAZREK 11/25 Multilingual mathematical e-document processing

Image based method Quality: poor typographical quality Problems Normalization Mathematical font Symbols encoding Typesetting system Size: big size files for storing or downloading Reuse: mathematical expressions can t be reedited Structuring: mathematical information is not available for searching, indexing,... Portability: mathematical expressions can t be processed by a computer algebra system nor a translation system Needs Normalization Mathematical font Symbols encoding Typesetting system Azzeddine LAZREK 11/25 Multilingual mathematical e-document processing

Problems Normalization Mathematical font Symbols encoding Typesetting system Sources of the standards: handbooks conventions such as the Amman s convention standards of the AMS and the ISO practices of TEX and MathML Azzeddine LAZREK 12/25 Multilingual mathematical e-document processing

Problems Normalization Mathematical font Symbols encoding Typesetting system Creating a mathematical font is a complex artistic and technical task RamzArab Design and development of the OpenType font RamzArab that tries to meet, as far as possible, the requirements of: Homogeneity: designing symbols with the same feather for homogeneous shapes, sizes, boldness,... Completeness: regrouping most of the usual specific Arabic symbols in use Originality: observing the most of practical calligraphic rules Azzeddine LAZREK 13/25 Multilingual mathematical e-document processing

Problems Normalization Mathematical font Symbols encoding Typesetting system CurExt Conception of CurExt system designing some dynamic symbols. The parentheses and Kashida symbols are representative of the possible cases of horizontal and vertical curvilinear extensibility. So, that can be easily generalized to other variable-sized symbols. A lengthening of the straight line is not in conformity with the Arabic calligraphy rules. Azzeddine LAZREK 14/25 Multilingual mathematical e-document processing

Problems Normalization Mathematical font Symbols encoding Typesetting system Unicode The Unicode Standard provides a quite complete set of standard mathematical characters. There are a good deal of symbols found in Arabic mathematical handbooks that are not yet part of the Unicode Standard and can t be obtained through a simple mirroring nor through a simple implementation process. Some of such special characters, designed in RamzArab, are proposed for inclusion into the Unicode Standard and now are on discussion. Until their adoption by Unicode, the symbols used in these tools will be located in the Private Use Area E000-F8FF in the Basic Multilingual Plane. Azzeddine LAZREK 15/25 Multilingual mathematical e-document processing

Problems Normalization Mathematical font Symbols encoding Typesetting system RydArab Development of RydArab system as a TEX package. It preserves all the distinguished qualities: a full numerical composition a high typographical quality offering several options to adapt it to areas and levels same commands structure in order to automatize translation generating documents in several formats (DVI, PS, PDF) producing documents in format HTML with image based mathematical expressions towards transformation from/to OpenMath MathML Azzeddine LAZREK 16/25 Multilingual mathematical e-document processing

Content MathML MathML I18n Semantically, an Arabic mathematical expression has the same functionality as its Latin equivalent: same content MathML tree skeleton but with different content for token elements. Azzeddine LAZREK 17/25 Multilingual mathematical e-document processing

Presentation MathML MathML I18n Only rendering aspects need to be taken into account: same presentation MathML tree skeleton with different content for token elements is insufficient. Azzeddine LAZREK 18/25 Multilingual mathematical e-document processing

MathML extension MathML I18n After examining all notational conventions, in current use with Arabic, the following step is to clarify the specification of MathML, proposing extensions where needed, so that MathML has the broadest coverage possible proposals: Direction: the overall mathematical directionality should be determined by a dir attribute on the outermost math element; which takes one of the values ltr or rtl. The text content of each token element should be treated as a separate directional segment and the bidirectional algorithm should be applied to each independently Additional Mathvariants: isolated, initial, tailed, looped, stretched and double-struck Mirroring: codepoints for Arabic mathematical symbols are not available yet, but appropriately marked for mirroring Arabic specific notation: additional allowed value madruwb for the notation attribute menclose of factorial symbol. Azzeddine LAZREK 19/25 Multilingual mathematical e-document processing

MathML extension MathML I18n After examining all notational conventions, in current use with Arabic, the following step is to clarify the specification of MathML, proposing extensions where needed, so that MathML has the broadest coverage possible proposals: Direction: the overall mathematical directionality should be determined by a dir attribute on the outermost math element; which takes one of the values ltr or rtl. The text content of each token element should be treated as a separate directional segment and the bidirectional algorithm should be applied to each independently Additional Mathvariants: isolated, initial, tailed, looped, stretched and double-struck Mirroring: codepoints for Arabic mathematical symbols are not available yet, but appropriately marked for mirroring Arabic specific notation: additional allowed value madruwb for the notation attribute menclose of factorial symbol. Azzeddine LAZREK 19/25 Multilingual mathematical e-document processing

MathML extension MathML I18n After examining all notational conventions, in current use with Arabic, the following step is to clarify the specification of MathML, proposing extensions where needed, so that MathML has the broadest coverage possible proposals: Direction: the overall mathematical directionality should be determined by a dir attribute on the outermost math element; which takes one of the values ltr or rtl. The text content of each token element should be treated as a separate directional segment and the bidirectional algorithm should be applied to each independently Additional Mathvariants: isolated, initial, tailed, looped, stretched and double-struck Mirroring: codepoints for Arabic mathematical symbols are not available yet, but appropriately marked for mirroring Arabic specific notation: additional allowed value madruwb for the notation attribute menclose of factorial symbol. Azzeddine LAZREK 19/25 Multilingual mathematical e-document processing

MathML extension MathML I18n After examining all notational conventions, in current use with Arabic, the following step is to clarify the specification of MathML, proposing extensions where needed, so that MathML has the broadest coverage possible proposals: Direction: the overall mathematical directionality should be determined by a dir attribute on the outermost math element; which takes one of the values ltr or rtl. The text content of each token element should be treated as a separate directional segment and the bidirectional algorithm should be applied to each independently Additional Mathvariants: isolated, initial, tailed, looped, stretched and double-struck Mirroring: codepoints for Arabic mathematical symbols are not available yet, but appropriately marked for mirroring Arabic specific notation: additional allowed value madruwb for the notation attribute menclose of factorial symbol. Azzeddine LAZREK 19/25 Multilingual mathematical e-document processing

MathML CtoAP MathML I18n Content MathML is generally language-neutral. In contrary, presentation MathML necessarily targets a specific language and notational conventions. After carrying out mathematical operations on content code, the usual situation is that the result is rendered. CtoAP is an XSLT stylesheet that converts a Content MathML to Arabic Presentation MathML. There may be several notations for one concept in the Arabic mathematical notation to choose: Azzeddine LAZREK 20/25 Multilingual mathematical e-document processing

Dadzilla Outline MathML I18n Dadzilla is a MathML browser for Arabic mathematical presentation. It is an adapted version of Mozilla. Already existing MathML elements can be used for both Latin and Arabic mathematical presentations after specifying the environment by the single attribute dir="rtl" in <math> element. To correctly display MathML documents in Arabic presentation, some fonts must be installed: Mirrored BaKoMa Computer Modern fonts & Adapted Arabic mathematical symbols font RamzArab. The mathematical expression editor mathmled is integrated after its adaptation for Arabic mathematical presentation. In the future version of Dadzilla, some improvements should be made. In particular, using stretched Arabic alphabetical symbols, offering some localization choices and improving the interface. Azzeddine LAZREK 21/25 Multilingual mathematical e-document processing

Conclusion The ambitious project for the development of communication and publication tools for Arabic scientific and technical e-documents in multilingual environment presents many challenges. Our goal was to identify the difficulties and limitations that can obstruct the use of existing tools for composing mathematics in Arabic presentation. Arabic mathematical e-documents can be composed, structured and published on the web using these developed tools. Thus, such documents benefit from all the advantages of using available norms. We hope that the previous proposals can help in finding suitable recommendations for multilingual mathematics in appropriate standards (Unicode, MathML, OpenMath,... ). Azzeddine LAZREK 22/25 Multilingual mathematical e-document processing

RamzArab Azzeddine LAZREK 23/25 Multilingual mathematical e-document processing

Dadzilla Azzeddine LAZREK 24/25 Multilingual mathematical e-document processing

Outline Dadzilla Azzeddine LAZREK 25/25 Multilingual mathematical e-document processing