Indic NLP Library Documentation

Size: px

Start display at page:

Download "Indic NLP Library Documentation"

Meredith Cassandra Reed
6 years ago
Views:

1 Indic NLP Library Documentation Release 0.2 Anoop Kunchukuttan Apr 27, 2018

3 Contents 1 indicnlp Package common Module langinfo Module loader Module Subpackages Indices and tables 13 Python Module Index 15 i

4 ii

5 Contents: Contents 1

6 2 Contents

7 CHAPTER 1 indicnlp Package 1.1 common Module exception indicnlp.common.indicnlpexception(msg) Bases: exceptions.exception Exceptions thrown by Indic NLP Library components are instances of this class. msg attribute contains exception details. indicnlp.common.get_resources_path() Get the path to the Indic NLP Resources directory indicnlp.common.init() Initialize the module. The following actions are performed: Checks of INDIC_RESOURCES_PATH variable is set. If not, checks if it can beb initialized from INDIC_RESOURCES_PATH environment variable. If that fails, an exception is raised indicnlp.common.set_resources_path(resources_path) Set the path to the Indic NLP Resources directory 1.2 langinfo Module indicnlp.langinfo.get_offset(c, lang) Applicable to Brahmi derived Indic scripts indicnlp.langinfo.in_coordinated_range(c_offset) Applicable to Brahmi derived Indic scripts indicnlp.langinfo.is_approximant(c, lang) Is the character an approximant consonant indicnlp.langinfo.is_aspirated(c, lang) Is the character a aspirated consonant 3

8 indicnlp.langinfo.is_aum(c, lang) Is the character a vowel sign (maatraa) indicnlp.langinfo.is_consonant(c, lang) Is the character a consonant indicnlp.langinfo.is_dental(c, lang) Is the character a dental indicnlp.langinfo.is_fricative(c, lang) Is the character a fricative consonant indicnlp.langinfo.is_halanta(c, lang) Is the character the halanta character indicnlp.langinfo.is_indiclang_char(c, lang) Applicable to Brahmi derived Indic scripts indicnlp.langinfo.is_labial(c, lang) Is the character a labial indicnlp.langinfo.is_nasal(c, lang) Is the character a nasal consonant indicnlp.langinfo.is_nukta(c, lang) Is the character the halanta character indicnlp.langinfo.is_number(c, lang) Is the character a number indicnlp.langinfo.is_palatal(c, lang) Is the character a palatal indicnlp.langinfo.is_retroflex(c, lang) Is the character a retroflex indicnlp.langinfo.is_unaspirated(c, lang) Is the character a unaspirated consonant indicnlp.langinfo.is_unvoiced(c, lang) Is the character a unvoiced consonant indicnlp.langinfo.is_velar(c, lang) Is the character a velar indicnlp.langinfo.is_voiced(c, lang) Is the character a voiced consonant indicnlp.langinfo.is_vowel(c, lang) Is the character a vowel indicnlp.langinfo.is_vowel_sign(c, lang) Is the character a vowel sign (maatraa) indicnlp.langinfo.offset_to_char(c, lang) Applicable to Brahmi derived Indic scripts 4 Chapter 1. indicnlp Package

9 1.3 loader Module 1.4 Subpackages morph Package unsupervised_morph Module normalize Package indic_normalize Module class indicnlp.normalize.indic_normalize.bengalinormalizer(remove_nuktas=false) Bases: indicnlp.normalize.indic_normalize.normalizeri Normalizer for the Bengali script. In addition to basic normalization by the super class, Replaces the composite characters containing nuktas by their decomposed form Replace the reserved character for poorna virama (if used) with the recommended generic Indic scripts poorna virama Canonicalize two part dependent vowels replace pipe character by poorna virama character replace colon : by visarga if the colon follows a charcter in this script NUKTA = u'\u09bc' normalize(text) class indicnlp.normalize.indic_normalize.devanagarinormalizer(remove_nuktas=false) Bases: indicnlp.normalize.indic_normalize.normalizeri Normalizer for the Devanagari script. In addition to basic normalization by the super class, Replaces the composite characters containing nuktas by their decomposed form replace pipe character by poorna virama character replace colon : by visarga if the colon follows a charcter in this script NUKTA = u'\u093c' get_char_stats(text) normalize(text) class indicnlp.normalize.indic_normalize.gujaratinormalizer(remove_nuktas=false) Bases: indicnlp.normalize.indic_normalize.normalizeri Normalizer for the Gujarati script. In addition to basic normalization by the super class, Replace the reserved character for poorna virama (if used) with the recommended generic Indic scripts poorna virama replace colon : by visarga if the colon follows a charcter in this script NUKTA = u'\u0abc' normalize(text) 1.3. loader Module 5

10 class indicnlp.normalize.indic_normalize.gurmukhinormalizer(remove_nuktas=false) Bases: indicnlp.normalize.indic_normalize.normalizeri Normalizer for the Gurmukhi script. In addition to basic normalization by the super class, Replaces the composite characters containing nuktas by their decomposed form Replace the reserved character for poorna virama (if used) with the recommended generic Indic scripts poorna virama replace pipe character by poorna virama character replace colon : by visarga if the colon follows a charcter in this script NUKTA = u'\u0a3c' normalize(text) class indicnlp.normalize.indic_normalize.indicnormalizerfactory Bases: object Factory class to create language specific normalizers. get_normalizer(language, remove_nuktas=false) Call the get_normalizer function to get the language specific normalizer Paramters: language: language code remove_nuktas: boolean, should the normalizer remove nukta characters is_language_supported(language) Is the language supported? class indicnlp.normalize.indic_normalize.kannadanormalizer Bases: indicnlp.normalize.indic_normalize.normalizeri Normalizer for the Kannada script. In addition to basic normalization by the super class, Replace the reserved character for poorna virama (if used) with the recommended generic Indic scripts poorna virama canonicalize two-part dependent vowel signs replace colon : by visarga if the colon follows a charcter in this script normalize(text) class indicnlp.normalize.indic_normalize.malayalamnormalizer Bases: indicnlp.normalize.indic_normalize.normalizeri Normalizer for the Malayalam script. In addition to basic normalization by the super class, Replace the reserved character for poorna virama (if used) with the recommended generic Indic scripts poorna virama canonicalize two-part dependent vowel signs Change from old encoding of chillus (till Unicode 5.0) to new encoding replace colon : by visarga if the colon follows a charcter in this script normalize(text) class indicnlp.normalize.indic_normalize.normalizeri Bases: object The normalizer classes do the following: 6 Chapter 1. indicnlp Package

11 Some characters have multiple Unicode codepoints. The normalizer chooses a single standard representation Some control characters are deleted While typing using the Latin keyboard, certain typical mistakes occur which are corrected by the module Base class for normalizer. Performs some common normalization, which includes: Byte order mark, word joiner, etc. removal ZERO_WIDTH_NON_JOINER and ZERO_WIDTH_JOINER removal ZERO_WIDTH_SPACE and NO_BREAK_SPACE replaced by spaces Script specific normalizers should derive from this class and override the normalize() method. They can call the super class normalize() method to avail of the common normalization BYTE_ORDER_MARK = u'\ufeff' BYTE_ORDER_MARK_2 = u'\ufffe' NO_BREAK_SPACE = u'\xa0' SOFT_HYPHEN = u'\xad' WORD_JOINER = u'\u2060' ZERO_WIDTH_JOINER = u'\u200d' ZERO_WIDTH_NON_JOINER = u'\u200c' ZERO_WIDTH_SPACE = u'\u200b' correct_visarga(text, visarga_char, char_range) get_char_stats(text) normalize(text) Method to be implemented for normalization for each script class indicnlp.normalize.indic_normalize.oriyanormalizer(remove_nuktas=false) Bases: indicnlp.normalize.indic_normalize.normalizeri Normalizer for the Oriya script. In addition to basic normalization by the super class, Replaces the composite characters containing nuktas by their decomposed form Replace the reserved character for poorna virama (if used) with the recommended generic Indic scripts poorna virama Canonicalize two part dependent vowels Replace va with ba replace pipe character by poorna virama character replace colon : by visarga if the colon follows a charcter in this script NUKTA = u'\u0b3c' normalize(text) class indicnlp.normalize.indic_normalize.tamilnormalizer Bases: indicnlp.normalize.indic_normalize.normalizeri Normalizer for the Tamil script. In addition to basic normalization by the super class, 1.4. Subpackages 7

12 Replace the reserved character for poorna virama (if used) with the recommended generic Indic scripts poorna virama canonicalize two-part dependent vowel signs replace colon : by visarga if the colon follows a charcter in this script normalize(text) class indicnlp.normalize.indic_normalize.telugunormalizer(remove_nuktas=false) Bases: indicnlp.normalize.indic_normalize.normalizeri Normalizer for the Teluguscript. In addition to basic normalization by the super class, Replace the reserved character for poorna virama (if used) with the recommended generic Indic scripts poorna virama canonicalize two-part dependent vowel signs replace colon : by visarga if the colon follows a charcter in this script get_char_stats(text) normalize(text) script Package indic_scripts Module tokenize Package indic_tokenize Module indicnlp.tokenize.indic_tokenize.trivial_tokenize(s, lang= hi ) Trivial tokenizer for languages in the Indian sub-continent indicnlp.tokenize.indic_tokenize.trivial_tokenize_indic(s) A trivial tokenizer which just tokenizes on the punctuation boundaries. This also includes punctuations for the Indian lang the purna virama and the deergha virama returns a list of tokens indicnlp.tokenize.indic_tokenize.trivial_tokenize_urdu(s) A trivial tokenizer which just tokenizes on the punctuation boundaries. This also includes punctuations for the Urdu script. These punctuations characters were identified from the Unicode database for Arabic script by looking for punctuation symbols. returns a list of tokens transliterate Package itrans_transliterator Module Transliterate texts between unicode and standard transliteration schemes. Transliterate texts between non-latin scripts and commonly-used latin transliteration schemes. Uses standard Unicode character blocks e.g. DEVANAGARI U U+097F and transliteration schemes e.g. the IAST convention for transliteration of Sanskrit to latin-with-dots. 8 Chapter 1. indicnlp Package

13 The following character blocks and transliteration schemes are included: DEVANAGARI IAST ITRANS (Sanskrit only) Harvard Kyoto CYRILLIC ISO 9:1995 (Russian only) New character blocks and transliteration schemes can be added by creating new CharacterBlock and TransliterationScheme objects. COMMAND LINE USAGE python transliterator.py text inputformat outputformat... writes the transliterated text to stdout text the text to be transliterated OR the name of a file containing the text inputformat the name of the character block or transliteration scheme that the text is to be transliterated FROM, e.g. CYRILLIC, IAST. Not case-sensitive outputformat the name of the character block or transliteration scheme that the text is to be transliterated TO, e.g. CYRILLIC, IAST. Not case-sensitive USAGE Transliterate a text: >>> import transliterator >>> transliterator.transliterate('yogazcittavrttinirodhah', 'harvardkyoto',... 'devanagari', {'outputasciiencoded' : True}) 'य गश च त तव & #x924; त न र ध ' Create a new CharacterBlock and TransliterationScheme: >>> import transliterator >>> cb = transliterator.characterblock('newblock', range(0x901, 0x9FF)) >>> scheme = transliterator.transliterationscheme(cb.name, 'NEWSCHEME',... {'ab': 0x901, 'cd': 0x902}) >>> transliterator.transliterate('abcd', scheme, cb, {'outputasciiencoded' : True}) ' ' COPYRIGHT AND DISCLAIMER Transliterator is: version 0.1 software - use at your own risk. The IAST, ITRANS and Harvard-Kyoto transliteration schemes have been tested for classical Sanskrit, not for any other language. The Cyrillic alphabet and ISO 9:1995 transliteration (for Russian only) are included but have been even more lightly tested than Devanagari. Copyright (c) 2005 by Alan Little By obtaining, using, and/or copying this software and/or its associated documentation, you agree that you have read, understood, and will comply with the following terms and conditions: 1.4. Subpackages 9

14 Permission to use, copy, modify, and distribute this software and its associated documentation for any purpose and without fee is hereby granted, provided that the above copyright notice appears in all copies, and that both that copyright notice and this permission notice appear in supporting documentation, and that the name of the author not be used in advertising or publicity pertaining to distribution of the software without specific, written prior permission. THE AUTHOR DISCLAIMS ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES WHATSO- EVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE. class indicnlp.transliterate.itrans_transliterator.characterblock(name, charrange, char- Class=<class indicnlp.transliterate.itrans_transliterator Bases: dict Dictionary-like representation of a set of unicode characters. For our purposes, a character block corresponds to an alphabet/script that we want to be able to transliterate to or from, e.g. Cyrillic, Devanagari. Keys are unicode characters. Values are TLCharacter instances. class indicnlp.transliterate.itrans_transliterator.devanagaricharacter(unicodehexvalue, block) Bases: indicnlp.transliterate.itrans_transliterator.tlcharacter Special processing for Devanagari characters. class indicnlp.transliterate.itrans_transliterator.devanagaricharacterblock(name, char- Range) Bases: indicnlp.transliterate.itrans_transliterator.characterblock, indicnlp. transliterate.itrans_transliterator._devanagari Class representing the Devanagari Unicode character block. class indicnlp.transliterate.itrans_transliterator.devanagaritransliterationscheme(blockname scheme- Name, data, swaptable=none Bases: indicnlp.transliterate.itrans_transliterator.transliterationscheme, indicnlp.transliterate.itrans_transliterator._devanagari Class representing a Devanagari transliteration scheme. indicnlp.transliterate.itrans_transliterator.itrans = {'R^i': 2315, 'gh': 2328, 'ld': 23 ITrans uses some characters only in common ligatures. The easiest way to deal with these is to replace them with their normal consonant equivalents before we try to transliterate. (This assumes we are mainly transliterating itrans inbound, and that the normal consonants are acceptable outbound. ITrans is not a good choice for outbound anyway because it has so many ambiguities) 10 Chapter 1. indicnlp Package

15 class indicnlp.transliterate.itrans_transliterator.tlcharacter(unicodehexvalue, block) Bases: object Class representing a Unicode character with its equivalents. Public attributes: unicodehexvalue the numeric value of the Unicode code point. unichr the character value of the Unicode code point. name the name of the Unicode code point. equivalents a dict containing the character s equivalents in various transliteration schemes, in the format: { Scheme A : A, Scheme B : aah, } where keys are TransliterationScheme names, values are transliterated equivalents of the character. addequivalent(equivname, equivalent) Add an equivalent for the character. Arguments: equivname the name of a TransliterationScheme equivalent string/unicode equivalent in the named TransliterationScheme for this code point. unicodehexvalue = None Use name check to filter out unused characters. unicodedata.name() raises ValueError for these class indicnlp.transliterate.itrans_transliterator.transliterationscheme(blockname, scheme- Name, data, swaptable=none) Bases: dict Dictionary-like representation of a transliteration scheme. e.g. the Harvard-Kyoto, IAST or ITRANS schemes for transliterating Devanagari to or from the latin alphabet. Keys are unicode strings representing the letter-equivalents used in the transliteration scheme. TLCharacter instances. indicnlp.transliterate.itrans_transliterator.main(argv=none) Call transliterator from a command line. python transliterator.py text inputformat outputformat... writes the transliterated text to stdout Values are text the text to be transliterated OR the name of a file containing the text inputformat the name of the character block or transliteration scheme that the text is to be transliterated FROM, e.g. CYRILLIC, IAST. Not case-sensitive outputformat the name of the character block or transliteration scheme that the text is to be transliterated TO, e.g. CYRILLIC, IAST. Not case-sensitive indicnlp.transliterate.itrans_transliterator.resetoptions() Reset options to their default values. indicnlp.transliterate.itrans_transliterator.transliterate(text, informat, out- Format, requestoptions={}) Transliterate a text Subpackages 11

16 Keyword arguments: text a unicode string containing the text to be transliterated informat the from CharacterBlock or TransliterationScheme, or its name outformat the target CharacterBlock or TransliterationScheme, or its name requestoptions optional dict containing option settings that override the defaults for this request. Returns a unicode object containing the text transliterated into the target character set. Raises: ValueError unrecognised input or output format. KeyError a character in text is not a member of informat, or has no corresponding character defined in outformat. sinhala_transliterator Module class indicnlp.transliterate.sinhala_transliterator.sinhaladevanagaritransliterator Bases: object A Devanagari to Sinhala transliterator based on explicit Unicode Mapping static devanagari_to_sinhala(text) devnag_sinhala_map = {u'\u0901': u'\u0d82', u'\u0900': u'\u0d82', u'\u0903': u'\u0d8 sinhala_devnag_map = {u'\u0d83': u'\u0903', u'\u0d82': u'\u0902', u'\u0d85': u'\u090 static sinhala_to_devanagari(text) unicode_transliterate Module class indicnlp.transliterate.unicode_transliterate.itranstransliterator Bases: object Transliterator between Indian scripts and ITRANS static from_itrans(text, lang_code) static to_itrans(text, lang_code) class indicnlp.transliterate.unicode_transliterate.unicodeindictransliterator Bases: object Base class for rule-based transliteration among Indian languages. Script pair specific transliterators should derive from this class and override the transliterate() method. They can call the super class transliterate() method to avail of the common transliteration static transliterate(text, lang1_code, lang2_code) convert the source language script (lang1) to target language script (lang2) text: text to transliterate lang1_code: language 1 code lang1_code: language 2 code 12 Chapter 1. indicnlp Package

17 CHAPTER 2 Indices and tables genindex modindex search 13

18 14 Chapter 2. Indices and tables

19 Python Module Index i indicnlp.common, 3 indicnlp.langinfo, 3 indicnlp.normalize.indic_normalize, 5 indicnlp.tokenize.indic_tokenize, 8 indicnlp.transliterate.itrans_transliterator, 8 indicnlp.transliterate.sinhala_transliterator, 12 indicnlp.transliterate.unicode_transliterate, 12 15

20 16 Python Module Index

21 Index A addequivalent() (indicnlp.transliterate.itrans_transliterator.tlcharacter F from_itrans() (indicnlp.transliterate.unicode_transliterate.itranstransliterato static method), 12 method), 11 BYTE_ORDER_MARK_2 (indicnlp.normalize.indic_normalize.normalizeri attribute), 7 B BengaliNormalizer (class in indicnlp.normalize.indic_normalize), 5 BYTE_ORDER_MARK (indicnlp.normalize.indic_normalize.normalizeri attribute), 7 C CharacterBlock (class in indicnlp.transliterate.itrans_transliterator), 10 correct_visarga() D (indicnlp.normalize.indic_normalize.normalizeri method), 7 G devanagari_to_sinhala() (indicnlp.transliterate.sinhala_transliterator.sinhaladevanagaritransliterator I static method), 12 DevanagariCharacter (class in indicnlp.transliterate.itrans_transliterator), in_coordinated_range() (in module indicnlp.langinfo), 3 indicnlp.common (module), 3 10 indicnlp.langinfo (module), 3 DevanagariCharacterBlock (class in indicnlp.transliterate.itrans_transliterator), indicnlp.normalize.indic_normalize (module), 5 10 indicnlp.tokenize.indic_tokenize (module), 8 DevanagariNormalizer (class in indicnlp.normalize.indic_normalize), indicnlp.transliterate.itrans_transliterator (module), 8 5 indicnlp.transliterate.sinhala_transliterator (module), 12 DevanagariTransliterationScheme (class in indicnlp.transliterate.itrans_transliterator), indicnlp.transliterate.unicode_transliterate (module), devnag_sinhala_map (indicnlp.transliterate.sinhala_transliterator.sinhaladevanagaritransliteratonlp.normalize.indic_normalize), IndicNlpException, 3 IndicNormalizerFactory (class in indic- 6 attribute), 12 init() (in module indicnlp.common), 3 is_approximant() (in module indicnlp.langinfo), 3 is_aspirated() (in module indicnlp.langinfo), 3 get_char_stats() (indicnlp.normalize.indic_normalize.devanagarinormalizer method), 5 get_char_stats() (indicnlp.normalize.indic_normalize.normalizeri method), 7 get_char_stats() (indicnlp.normalize.indic_normalize.telugunormalizer method), 8 get_normalizer() (indicnlp.normalize.indic_normalize.indicnormalizerfactory method), 6 get_offset() (in module indicnlp.langinfo), 3 get_resources_path() (in module indicnlp.common), 3 GujaratiNormalizer (class in indicnlp.normalize.indic_normalize), 5 GurmukhiNormalizer (class in indicnlp.normalize.indic_normalize), 5 17

22 is_aum() (in module indicnlp.langinfo), 3 is_consonant() (in module indicnlp.langinfo), 4 is_dental() (in module indicnlp.langinfo), 4 is_fricative() (in module indicnlp.langinfo), 4 is_halanta() (in module indicnlp.langinfo), 4 is_indiclang_char() (in module indicnlp.langinfo), 4 is_labial() (in module indicnlp.langinfo), 4 is_language_supported() (indic- nlp.normalize.indic_normalize), 6 nlp.normalize.indic_normalize.indicnormalizerfactory NUKTA (indicnlp.normalize.indic_normalize.bengalinormalizer method), 6 attribute), 5 is_nasal() (in module indicnlp.langinfo), 4 NUKTA (indicnlp.normalize.indic_normalize.devanagarinormalizer is_nukta() (in module indicnlp.langinfo), 4 attribute), 5 is_number() (in module indicnlp.langinfo), 4 NUKTA (indicnlp.normalize.indic_normalize.gujaratinormalizer is_palatal() (in module indicnlp.langinfo), 4 attribute), 5 is_retroflex() (in module indicnlp.langinfo), 4 NUKTA (indicnlp.normalize.indic_normalize.gurmukhinormalizer is_unaspirated() (in module indicnlp.langinfo), 4 attribute), 6 is_unvoiced() (in module indicnlp.langinfo), 4 NUKTA (indicnlp.normalize.indic_normalize.oriyanormalizer is_velar() (in module indicnlp.langinfo), 4 attribute), 7 is_voiced() (in module indicnlp.langinfo), 4 is_vowel() (in module indicnlp.langinfo), 4 O is_vowel_sign() (in module indicnlp.langinfo), 4 offset_to_char() (in module indicnlp.langinfo), 4 ITRANS (in module indicnlp.transliterate.itrans_transliterator), 10 nlp.normalize.indic_normalize), 7 OriyaNormalizer (class in indic- ItransTransliterator (class in indicnlp.transliterate.unicode_transliterate), 12 K KannadaNormalizer (class in indicnlp.normalize.indic_normalize), 6 M main() (in module indicnlp.transliterate.itrans_transliterator), 11 MalayalamNormalizer (class in indicnlp.normalize.indic_normalize), 6 N NO_BREAK_SPACE (indicnlp.normalize.indic_normalize.normalizeri attribute), 7 normalize() (indicnlp.normalize.indic_normalize.bengalinormalizer method), 5 normalize() (indicnlp.normalize.indic_normalize.devanagarinormalizer normalize() (indicnlp.normalize.indic_normalize.oriyanormalizer method), 7 normalize() (indicnlp.normalize.indic_normalize.tamilnormalizer method), 8 normalize() (indicnlp.normalize.indic_normalize.telugunormalizer method), 8 NormalizerI (class in indic- R resetoptions() (in module indicnlp.transliterate.itrans_transliterator), 11 S nlp.transliterate.sinhala_transliterator.sinhaladevanagaritransliter nlp.transliterate.sinhala_transliterator.sinhaladevanagaritransliter set_resources_path() (in module indicnlp.common), 3 sinhala_devnag_map (indic- attribute), 12 sinhala_to_devanagari() (indic- static method), 12 SinhalaDevanagariTransliterator (class in indicnlp.transliterate.sinhala_transliterator), 12 SOFT_HYPHEN T (indicnlp.normalize.indic_normalize.normalizeri attribute), 7 method), 5 TamilNormalizer (class in indicnlp.normalize.indic_normalize), 7 normalize() (indicnlp.normalize.indic_normalize.gujaratinormalizer method), 5 TeluguNormalizer (class in indicnlp.normalize.indic_normalize), 8 normalize() (indicnlp.normalize.indic_normalize.gurmukhinormalizer method), 6 TLCharacter (class in indicnlp.transliterate.itrans_transliterator), 10 normalize() (indicnlp.normalize.indic_normalize.kannadanormalizer method), 6 to_itrans() (indicnlp.transliterate.unicode_transliterate.itranstransliterator normalize() (indicnlp.normalize.indic_normalize.malayalamnormalizer static method), 12 method), 6 transliterate() (in module indicnlp.transliterate.itrans_transliterator), 11 normalize() (indicnlp.normalize.indic_normalize.normalizeri method), 7 18 Index

23 transliterate() (indicnlp.transliterate.unicode_transliterate.unicodeindictransliterator static method), 12 TransliterationScheme (class in indicnlp.transliterate.itrans_transliterator), 11 trivial_tokenize() (in module indicnlp.tokenize.indic_tokenize), 8 trivial_tokenize_indic() (in module indicnlp.tokenize.indic_tokenize), 8 trivial_tokenize_urdu() (in module indicnlp.tokenize.indic_tokenize), 8 U unicodehexvalue (indicnlp.transliterate.itrans_transliterator.tlcharacter attribute), 11 UnicodeIndicTransliterator (class in indicnlp.transliterate.unicode_transliterate), 12 W WORD_JOINER (indicnlp.normalize.indic_normalize.normalizeri attribute), 7 Z ZERO_WIDTH_JOINER (indicnlp.normalize.indic_normalize.normalizeri attribute), 7 ZERO_WIDTH_NON_JOINER (indicnlp.normalize.indic_normalize.normalizeri attribute), 7 ZERO_WIDTH_SPACE (indicnlp.normalize.indic_normalize.normalizeri attribute), 7 Index 19

Blending Content for South Asian Language Pedagogy Part 2: South Asian Languages on the Internet

Blending Content for South Asian Language Pedagogy Part 2: South Asian Languages on the Internet A. Sean Pue South Asia Language Resource Center Pre-SASLI Workshop 6/7/09 1 Objectives To understand how