Character Encodings. Fabian M. Suchanek
|
|
- Florence Edwards
- 5 years ago
- Views:
Transcription
1 Character Encodings Fabian M. Suchanek 22
2 Semantic IE Reasoning Fact Extraction You are here Instance Extraction singer Entity Disambiguation singer Elvis Entity Recognition Source Selection and Preparation
3 Scripts Thanks for all the fish Google Translate. No warranty. 3
4 Scripts Thanks for all the fish (Latin) ( Simplified Chinese) (Hebrew) (Arabic) (Korean) (Thai) Google Translate. No warranty. 4
5 How to map characters to bytes? y A a ß? 1 byte = 8 bit = 255 e é numbers ,000 different characters from 90 scripts 5
6 Def: Character encoding A character encoding (also: char encoding) is a bijective mapping from characters to (sequences of) bytes. A -> 65 B -> 66 characters ü -> 99, 42 bytes... -> 99, 99, 2 6
7 Def: ASCII encoding The ASCII encoding is a particular character encoding maps certain chars to single bytes, and ignores the others. A -> 65 B -> 66 C -> > ü -> 26 letters + 26 lowercase letters + punctuation 100 characters Disadvantage: works only for English 7
8 Def: Code pages A code page is a character encoding that maps script-specific characters to single bytes. Greek code page: A -> 65 B -> 66 Western code page: A -> 65 B -> 66 α -> à -> (0-127 are usually mapped as in ASCII) Disadvantages: We have to know the code page We cannot mix scripts We cannot represent more than 256 characters Example (View -> Encoding) 8
9 Def: HTML entities HTML entities are a particular character encoding where particular strings (as defined by W3C) represent characters. à -> à ü -> ü ß -> ß... These are sequences of bytes if encoded in ASCII Advantage: Works in all browsers Disadvantage: Very clumsy Example List 9
10 Def: Unicode Unicode is a character encoding that maps each character to [0, ], i.e., to 4 bytes. A -> 65 -> 0, 0, 0, 65 B -> 66 -> 0, 0, 0, 66 α-> > 0, 0, 3, 42 -> > 0, 0, 4, (not the real mappings) Characters are as in ASCII Advantage: Maps all known characters Disadvantage: Takes much space Example1 Example2 10
11 Def: UTF-8 UTF-8 is a particular character encoding that maps Unicode characters to sequences of bytes of different lengths. A -> 65 B -> 66 α-> 128, 42 -> 128, 128, 32 11
12 UTF-8: Chars 0-0x7F Unicode chars 0-0x7F are mapped like in ASCII (i.e., to a single byte). A -> 65 B -> a -> 96 b -> $ -> 36! -> Advantages: Compatibility with ASCII and code pages Space efficiency for English docs 12
13 13 UTF-8: Chars 0x80-0x7FF Unicode chars 0x80-0x7FF (11 bits) are mapped to two bytes as follows: xxxxxxxxxxx 110xxxxx 10xxxxxx 2 bytes Unicode 9x80-0x7FF are Greek, Arabic, Hebrew etc.
14 14 UTF-8: Chars 0x80-0x7FF Example Unicode 11 bit representation ç = 0xE7 =
15 UTF-8: Chars 0x80-0x7FF Example Example: Encoding façade f a ç a... 0x66 0x61 0xE7 0x Example 15
16 UTF-8: Chars 0x800-0x7FFF Unicode chars 0x800-0x7FFF (16 bits) are mapped to three bytes as follows: xxxxxxxxxxxxxxxx 1110xxxx 10xxxxxx 10xxxxxx 3 bytes This character range concerns mainly Chinese. 16
17 17 Decoding UTF f a ç a... if the byte starts with 0xxxxxxx if the byte starts with 110xxxxx if the byte starts with 1110xxxx if the byte starts with 10xxxxxx
18 Decoding UTF f a ç a... if the byte starts with 0xxxxxxx => it s a normal ASCII character if the byte starts with 110xxxxx => it s an extended char, 1 byte follows if the byte starts with 1110xxxx => it s a Chinese char, 2 bytes follows if the byte starts with 10xxxxxx => it s a follower byte, you messed it up! 18
19 Summary: UTF-8 UTF-8 maps Unicode chars to 1-4 bytes. Advantages: common Western chars are only 1 byte backwards compatibility with ASCII stream readability (follower bytes cannot be confused with marker bytes) sorting compliance 19
20 UTF-16 UTF-8 is inefficient if the text contains many characters between 0x0800 and 0xFFFF: It needs 3 bytes. For this reason UTF-16 has been proposed. It encodes every Unicode character in either 16 or 32 bits. Advantages: less space consumption in the range 0x0800-0xFFFF: 2 bytes as opposed to 3 in UTF-8. Disadvantages: more space consumption in the range 0x0000-0x007F: 2 bytes instead of 1 in UTF-8. not backwards compatible to ASCII 20
21 Example: Char encodings in Python Reading: with open( text.txt, encoding= utf-8 ) as file: for line in file: print(line) If omitted, this uses the default encoding of the operating system, which might be different from UTF-8! Writing: with open( text.txt, w, encoding= utf-8 ) as file: file.write( Bună ziua! ) 21
22 22 Example: Char encodings in Java File f = new File(...); InputStream s = new FileInputStream(f); Reader r = new InputStreamReader(s, UTF-8 ); f a ç
23 Summary: Char encodings ASCII: only English chars Code pages: one page per script HTML entities: work in browsers Unicode: maps all chars UTF-8: maps chars to variable # bytes In most applications, UTF-8 is the encoding of choice. ->archiving ->named-entity-recognition 23
UTF and Turkish. İstinye University. Representing Text
Representing Text Representation of text predates the use of computers for text Text representation was needed for communication equipment One particular commonly used communication equipment was teleprinter
More informationRepresenting Characters and Text
Representing Characters and Text cs4: Computer Science Bootcamp Çetin Kaya Koç cetinkoc@ucsb.edu Çetin Kaya Koç http://koclab.org Winter 2018 1 / 28 Representing Text Representation of text predates the
More informationRepresenting Characters, Strings and Text
Çetin Kaya Koç http://koclab.cs.ucsb.edu/teaching/cs192 koc@cs.ucsb.edu Çetin Kaya Koç http://koclab.cs.ucsb.edu Fall 2016 1 / 19 Representing and Processing Text Representation of text predates the use
More information2011 Martin v. Löwis. Data-centric XML. Character Sets
Data-centric XML Character Sets Character Sets: Rationale Computer stores data in sequences of bytes each byte represents a value in range 0..255 Text data are intended to denote characters, not numbers
More information2007 Martin v. Löwis. Data-centric XML. Character Sets
Data-centric XML Character Sets Character Sets: Rationale Computer stores data in sequences of bytes each byte represents a value in range 0..255 Text data are intended to denote characters, not numbers
More informationPIC 20A Streams and I/O
PIC 20A Streams and I/O Ernest Ryu UCLA Mathematics Last edited: December 7, 2017 Why streams? Often, you want to do I/O without paying attention to where you are reading from or writing to. You can read
More informationCasabac Unicode Support
Unicode Support Unicode Support Full Unicode support was added into the GUI Server with build 25_20040105. Before ISO 8859-1 was used for encoding and decoding HTML pages and your system's default encoding
More informationby Martin J. Dürst, University of Zurich (1997) Presented by Marvin Humphrey for Papers We Love San Diego November 1, 2018
THE PROPERTIES AND PROMISES OF UTF-8 by Martin J. Dürst, University of Zurich (1997) Presented by Marvin Humphrey for Papers We Love San Diego November 1, 2018 Or... UTF-8: What Is All This à Ã?! OVERVIEW
More informationPractical character sets
Practical character sets In MySQL, on the web, and everywhere Domas Mituzas MySQL @ Sun Microsystems Wikimedia Foundation It seems simple a b c d e f a ą b c č d e ę ė f а б ц д е ф פ ע ד צ ב א... ---...
More informationLBSC 690: Information Technology Lecture 05 Structured data and databases
LBSC 690: Information Technology Lecture 05 Structured data and databases William Webber CIS, University of Maryland Spring semester, 2012 Interpreting bits "my" 13.5801 268 010011010110 3rd Feb, 2014
More informationUsing non-latin alphabets in Blaise
Using non-latin alphabets in Blaise Rob Groeneveld, Statistics Netherlands 1. Basic techniques with fonts In the Data Entry Program in Blaise, it is possible to use different fonts. Here, we show an example
More informationCS144: Content Encoding
CS144: Content Encoding MIME (Multi-purpose Internet Mail Extensions) Q: Only bits are transmitted over the Internet. How does a browser/application interpret the bits and display them correctly? MIME
More informationThe Unicode Standard Version 11.0 Core Specification
The Unicode Standard Version 11.0 Core Specification To learn about the latest version of the Unicode Standard, see http://www.unicode.org/versions/latest/. Many of the designations used by manufacturers
More informationRepresenting text on the computer: ASCII, Unicode, and UTF 8
Representing text on the computer: ASCII, Unicode, and UTF 8 STAT/CS 287 Jim Bagrow Question: computers can only understand numbers. In particular, only two numbers, 0 and 1 (binary digits or bits). So
More informationSAPGUI for Windows - I18N User s Guide
Page 1 of 30 SAPGUI for Windows - I18N User s Guide Introduction This guide is intended for the users of SAPGUI who logon to Unicode systems and those who logon to non-unicode systems whose code-page is
More informationFriendly Fonts for your Design
Friendly Fonts for your Design Choosing the right typeface for your website copy is important, since it will affect the way your readers perceive your page (serious and formal, or friendly and casual).
More informationTex with Unicode Characters
Tex with Unicode Characters 7/10/18 Presented by: Yuefei Xiang Agenda ASCII Code Unicode Unicode in Tex Old Style Encoding -Inputenc, -ucs Morden Encoding -XeTeX -LuaTeX Unicode bi-direction in Tex -Emacs-AucTeX
More informationBookmarks for PDF Output(Outline-Group)
Bookmarks for PDF Output(Outline-Group) The axf:outline-group groups bookmark items of PDF, and outputs them collectively. Value: Initial: empty string Applies to: block-level formatting objects
More informationD16 Code sets, NLS and character conversion vs. DB2
D16 Code sets, NLS and character conversion vs. DB2 Roland Schock ARS Computer und Consulting GmbH 05.10.2006 11:45 a.m. 12:45 p.m. Platform: DB2 for Linux, Unix, Windows Code sets and character conversion
More informationNavigating the pitfalls of cross platform copies
Navigating the pitfalls of cross platform copies Kai Stroh, UBS Hainer GmbH Overview Motivation Some people are looking for a way to copy data from Db2 for z/ OS to other platforms Reasons include: Number
More informationCS 137 Part 6. ASCII, Characters, Strings and Unicode. November 3rd, 2017
CS 137 Part 6 ASCII, Characters, Strings and Unicode November 3rd, 2017 Characters Syntax char c; We ve already seen this briefly earlier in the term. In C, this is an 8-bit integer. The integer can be
More informationUnicode and Non Unicode Printing with the Swiss 721 Font
Unicode and Non Unicode Printing with the Swiss 721 Font There are many methods of printing international characters with Unicode fonts on a Zebra printer. We offer a free Swiss 721 font with 983 characters
More informationCOSC431 IR. Compression. Richard A. O'Keefe
COSC431 IR Compression Richard A. O'Keefe Shannon/Barnard Entropy = sum p(c).log 2 (p(c)), taken over characters c Measured in bits, is a limit on how many bits per character an encoding would need. Shannon
More informationCoding Theory. Networks and Embedded Software. Digital Circuits. by Wolfgang Neff
Coding Theory Networks and Embedded Software Digital Circuits by Wolfgang Neff Coding (1) Basic concepts Information Knowledge about something Abstract concept (just in mind, can not be touched) Data Representation
More informationSource coding and compression
Computer Mathematics Week 5 Source coding and compression College of Information Science and Engineering Ritsumeikan University last week binary representations of signed numbers sign-magnitude, biased
More informationCOSC 243 (Computer Architecture)
COSC 243 Computer Architecture And Operating Systems 1 Dr. Andrew Trotman Instructors Office: 123A, Owheo Phone: 479-7842 Email: andrew@cs.otago.ac.nz Dr. Zhiyi Huang (course coordinator) Office: 126,
More informationAdministrative Notes February 9, 2017
Administrative Notes February 9, 2017 Feb 10: Project proposal resubmission (optional) Feb 13: Art and Images reading quiz Feb 17: In the News call #2 Data Representation: Part 2 Text representation Colour
More informationYou 2 Software
PrismaCards Enter text for languages with exotic fonts You 2 Software http://www.you2.de info@you2.de Introduction To work in PrismaCards and other programs with complex fonts for different languages you
More information2 Well-formed XML. We will now try to approach XML in a slightly more formal way. The nuts and bolts of XML are pleasingly easy to grasp.
2 Well-formed XML We will now try to approach XML in a slightly more formal way. The nuts and bolts of XML are pleasingly easy to grasp. This discussion will be based on the central XML technical specifcation:
More informationIntroduction to Normalization and Modern Collation
Introduction to Normalization and Modern Collation Roozbeh Pournader Sharif FarsiWeb, Inc. roozbeh@farsiweb.info The gap that needed filling For compatibility reasons, Unicode has more than one way to
More informationChapter 7. Representing Information Digitally
Chapter 7 Representing Information Digitally Learning Objectives Explain the link between patterns, symbols, and information Determine possible PandA encodings using a physical phenomenon Encode and decode
More information2a. Codes and number systems (continued) How to get the binary representation of an integer: special case of application of the inverse Horner scheme
2a. Codes and number systems (continued) How to get the binary representation of an integer: special case of application of the inverse Horner scheme repeated (integer) division by two. Example: What is
More informationGoogle Search Appliance
Google Search Appliance Search Appliance Internationalization Google Search Appliance software version 7.2 and later Google, Inc. 1600 Amphitheatre Parkway Mountain View, CA 94043 www.google.com GSA-INTL_200.01
More informationLecture 25: Internationalization. UI Hall of Fame or Shame? Today s Topics. Internationalization Design challenges Implementation techniques
Lecture 25: Internationalization Spring 2008 6.831 User Interface Design and Implementation 1 UI Hall of Fame or Shame? Our Hall of Fame or Shame candidate for the day is this interface for choosing how
More informationMultilingual vi Clones: Past, Now and the Future
THE ADVANCED COMPUTING SYSTEMS ASSOCIATION The following paper was originally published in the Proceedings of the FREENIX Track: 1999 USENIX Annual Technical Conference Monterey, California, USA, June
More informationChapter 10: Understanding the Standards
Disclaimer: All words, pictures are adopted from Learning Web Design (3 rd eds.) by Jennifer Niederst Robbins, published by O Reilly 2007. Chapter 10: Understanding the Standards CSc2320 In this chapter
More informationAction Message Format -- AMF 3
Adobe Systems Inc. Category: ActionScript Serialization Action Message Format -- AMF 3 Copyright Notice Copyright (c) Adobe Systems Inc. (2002-2006). All Rights Reserved. Abstract Action Message Format
More informationThe Use of Unicode in MARC 21 Records. What is MARC?
# The Use of Unicode in MARC 21 Records Joan M. Aliprand Senior Analyst, RLG What is MARC? MAchine-Readable Cataloging MARC is an exchange format Focus on MARC 21 exchange format An implementation may
More informationIT101. Characters: from ASCII to Unicode
IT101 Characters: from ASCII to Unicode Java Primitives Note the char (character) primitive. How does it represent the alphabet letters? What is the difference between char and String? Does a String consist
More informationDesktop Crawls. Document Feeds. Document Feeds. Information Retrieval
Information Retrieval INFO 4300 / CS 4300! Web crawlers Retrieving web pages Crawling the web» Desktop crawlers» Document feeds File conversion Storing the documents Removing noise Desktop Crawls! Used
More informationComputer Science 1001.py. Lecture 19, part B: Characters and Text Representation: Ascii and Unicode
Computer Science 1001.py Lecture 19, part B: Characters and Text Representation: Ascii and Unicode Instructors: Benny Chor, Amir Rubinstein Teaching Assistants: Amir Gilad, Michal Kleinbort Founding Teaching
More informationCan R Speak Your Language?
Languages Can R Speak Your Language? Brian D. Ripley Professor of Applied Statistics University of Oxford ripley@stats.ox.ac.uk http://www.stats.ox.ac.uk/ ripley The lingua franca of computing is (American)
More informationIntroduction to XML. An Example XML Document. The following is a very simple XML document.
Introduction to XML Extensible Markup Language (XML) was standardized in 1998 after 2 years of work. However, it developed out of SGML (Standard Generalized Markup Language), a product of the 1970s and
More informationComputer Science 1001.py. Lecture 19a: Generators continued; Characters and Text Representation: Ascii and Unicode
Computer Science 1001.py Lecture 19a: Generators continued; Characters and Text Representation: Ascii and Unicode Instructors: Daniel Deutch, Amir Rubinstein Teaching Assistants: Ben Bogin, Michal Kleinbort,
More informationCoordination! As complex as Format Integration!
True Scripts in Library Catalogs The Way Forward Joan M. Aliprand Senior Analyst, RLG 2004 RLG Why the current limitation? Coordination! As complex as Format Integration! www.ala.org/alcts 1 Script Capability
More informationCharacters, Strings and Text Processing in Java
Characters, Strings and Text Processing in Java Concepts: Characters, Code Points, Encodings Unicode: A to Z and more Jonas Kvarnström.liu.se 2012 Jonas Kvarnström.liu.se 2012 Characters 3 Character Repertoires
More informationPart III. Well-Formed XML. Torsten Grust (WSI) Database-Supported XML Processors Winter 2008/09 33
Part III Well-Formed XML Torsten Grust (WSI) Database-Supported XML Processors Winter 2008/09 33 Outline of this part 1 Formalization of XML Elements Attributes Entities 2 Well-Formedness Context-free
More informationCHAPTER 5 1 RECORDS MANAGEMENT
Slide 1 Slide 2 Using Databases Databases are organized for rapid search and retrieval Databases have objects: Tables Forms Queries Reports Slide 3 Access Database Table Fields are arranged in columns
More informationAppendix C. Numeric and Character Entity Reference
Appendix C Numeric and Character Entity Reference 2 How to Do Everything with HTML & XHTML As you design Web pages, there may be occasions when you want to insert characters that are not available on your
More information5/10/2009. Introduction. The light-saber is a Jedi s weapon not as clumsy or random as a blaster.
The Hacking Protocols and The Hackers Sword The light-saber is a Jedi s weapon not as clumsy or random as a blaster. Obi-Wan Kenobi, Star Wars: Episode IV Slide 2 Introduction Why are firewalls basically
More informationLab 5: Ciphers and Crypto Fundamentals
Lab 5: Ciphers and Crypto Fundamentals Aim: Bill, Richard, Charley The aim of this lab is to give an introduction to ciphers, basic encoding/decoding techniques and frequency analysis, as to provide some
More informationNote 8. Internationalization
Computer Science and Software Engineering University of Wisconsin - Platteville Note 8. Internationalization Yan Shi SE 3730 / CS 5730 Lecture Notes Part of the contents are from Ibrahim Meru s presentation
More informationPRECIS and i18n. IETF Tutorial IETF Yokohama, Japan
PRECIS and i18n IETF Tutorial IETF 94 @ Yokohama, Japan 1 A brief history 2 A brief history The beginning of the Internet was ASCII only It was enough for the researchers The Internet growth was accelerated
More informationIII-16Text Encodings. Chapter III-16
Chapter III-16 III-16Text Encodings Overview... 410 Text Encoding Overview... 410 Text Encodings Commonly Used in Igor... 411 Western Text Encodings... 412 Asian Text Encodings... 412 Unicode... 412 Unicode
More informationJapanese utf 8 font. Japanese utf 8 font.zip
Japanese utf 8 font Japanese utf 8 font.zip 22/11/2010 Japanese: 私はガラスを (Literal UTF-8) Representing Middle English on the Web with UTF-8; The Kermit Bibliography (in UTF-8)What I'd like to do is save
More informationIntermediate Programming & Design (C++) Notation
Notation Byte = 8 bits (a sequence of 0 s and 1 s) To indicate larger amounts of storage, some prefixes taken from the metric system are used One kilobyte (KB) = 2 10 bytes = 1024 bytes 10 3 bytes One
More informationChapter 3. Information Representation
Chapter 3 Information Representation Instruction Set Architecture APPLICATION LEVEL HIGH-ORDER LANGUAGE LEVEL ASSEMBLY LEVEL OPERATING SYSTEM LEVEL INSTRUCTION SET ARCHITECTURE LEVEL 3 MICROCODE LEVEL
More information# or you can even do this if your shell supports your native encoding
NAME SYNOPSIS encoding - allows you to write your script in non-ascii or non-utf8 use encoding "greek"; # Perl like Greek to you? use encoding "euc-jp"; # Jperl! # or you can even do this if your shell
More informationSyntax Error Malformed Unicode Character Escape Sequence Javascript
Syntax Error Malformed Unicode Character Escape Sequence Javascript Firebug showed a malformed unicode character escape sequence but I have no media on the console and you'll see the syntax error in the
More informationMultilingual Computing with the 9.1 SAS Unicode Server Stephen Beatrous, SAS Institute, Cary, NC
Paper 1036 Multilingual Computing with the 9.1 Unicode Server Stephen Beatrous, Institute, Cary, NC ABSTRACT In today s business world, information comes in many languages and you may have customers and
More informationXML is a popular multi-language system, and XHTML depends on it. XML details languages
1 XML XML is a popular multi-language system, and XHTML depends on it XML details languages XML 2 Many of the newer standards, including XHTML, are based on XML = Extensible Markup Language, so we will
More informationDICOM CONFORMANCE STATEMENT
DICOM CONFORMANCE STATEMENT PRODUCT DETAILS: Product type: Data Analysis Software Product Name: TOMTEC-ARENA TTA2 MANUFACTURER: TOMTEC IMAGING SYSTEMS GMBH Edisonstrasse 6 85716 Unterschleissheim USED
More informationPrinceton University. Computer Science 217: Introduction to Programming Systems. Data Types in C
Princeton University Computer Science 217: Introduction to Programming Systems Data Types in C 1 Goals of C Designers wanted C to: Support system programming Be low-level Be easy for people to handle But
More informationBiDi in the Wild. Challenges of the Unicode Bidirectional algorithm. Moriel Schottlender Software Engineer
BiDi in the Wild Challenges of the Unicode Bidirectional algorithm Moriel Schottlender Software Engineer Wikipedia s Right-to-Left support Right-to-Left Wikipedias ~260 Wikipedias in Left-to-Right ~17
More informationEMu Documentation. Unicode in EMu 5.0. Document Version 1. EMu 5.0
EMu Documentation Unicode in EMu 5.0 Document Version 1 EMu 5.0 Contents SECTION 1 Unicode 1 Overview 1 Code Points 3 Inputting Unicode Characters 6 Graphemes 10 Index Terms 11 SECTION 2 Searching 15
More informationUNITED STATES GOVERNMENT Memorandum LIBRARY OF CONGRESS
UNITED STATES GOVERNMENT Memorandum LIBRARY OF CONGRESS 5JSC/LC/5 TO: Joint Steering Committee for Revision of AACR DATE: FROM: SUBJECT: Barbara B. Tillett, LC Representative RDA Part I Internationalization
More information9/3/2015. Data Representation II. 2.4 Signed Integer Representation. 2.4 Signed Integer Representation
Data Representation II CMSC 313 Sections 01, 02 The conversions we have so far presented have involved only unsigned numbers. To represent signed integers, computer systems allocate the high-order bit
More informationTABLE OF CONTENTS 2 CHAPTER 1 3 CHAPTER 2 4 CHAPTER 3 5 CHAPTER 4. Algorithm Design & Problem Solving. Data Representation.
2 CHAPTER 1 Algorithm Design & Problem Solving 3 CHAPTER 2 Data Representation 4 CHAPTER 3 Programming 5 CHAPTER 4 Software Development TABLE OF CONTENTS 1. ALGORITHM DESIGN & PROBLEM-SOLVING Algorithm:
More informationCOM Text User Manual
COM Text User Manual Version: COM_Text_Manual_EN_V2.0 1 COM Text introduction COM Text software is a Serial Keys emulator for Windows Operating System. COM Text can transform the Hexadecimal data (received
More informationIntroduction 1. Chapter 1
This PDF file is an excerpt from The Unicode Standard, Version 5.2, issued and published by the Unicode Consortium. The PDF files have not been modified to reflect the corrections found on the Updates
More informationTutorial 1: C-Language
Tutorial 1: C-Language Problem 1: Data Type What are the ranges of the following data types? int 32 bits 2 31..2 31-1 OR -2147483648..2147483647 (0..4294967295 if unsiged) in some machines int is same
More informationTerry Carl Walker 1739 East Palm Lane Phoenix, AZ United States
TCW_#_α-0»: TRUNK MULTI-IO TEXT UTF-7.rtf G+2013-05-03-5.zz.h02:33: Terry Carl Walker 1739 East Palm Lane Phoenix, AZ 85006-1930 United States 1-480-929-9628 waxymat@aztecfreenet.org Proposal for UTF-7
More informationFiles on disk are organized hierarchically in directories (folders). We will first review some basics about working with them.
1 z 9 Files Petr Pošík Department of Cybernetics, FEE CTU in Prague EECS, BE5B33PRG: Programming Essentials, 2015 Requirements: Loops Intro Information on a computer is stored in named chunks of data called
More informationAdvanced Evidence Collection and Anal ysis of Web Browser Activity
Advanced Evidence Collection and Anal ysis of Web Browser Activity J. Oh, S. Lee and S. Lee Digital Forensics Research Center, Korea University Junghoon Oh blue0226@korea.ac.kr Agenda 1. Introduction 2.
More informationSearch Engines. Information Retrieval in Practice
Search Engines Information Retrieval in Practice All slides Addison Wesley, 2008 Web Crawler Finds and downloads web pages automatically provides the collection for searching Web is huge and constantly
More informationAFP Support for TrueType/Open Type Fonts and Unicode
AFP Support for TrueType/Open Type Fonts and Unicode Reinhard Hohensee Distinguished Engineer October 24, 2003 Ricoh Topics What is Unicode? What are TrueType and OpenType fonts? Why have we extended the
More informationConversion of Cyrillic script to Score with SipXML2Score Author: Jan de Kloe Version: 2.00 Date: June 28 th, 2003, last updated January 24, 2007
Title: Conversion of Cyrillic script to Score with SipXML2Score Author: Jan de Kloe Version: 2.00 Date: June 28 th, 2003, last updated January 24, 2007 Scope There is no limitation in MusicXML to the encoding
More informationVersion 5.5. Multi-language Projects. Citect Pty Ltd 3 Fitzsimmons Lane Gordon NSW 2072 Australia
Version 5.5 Multi-language Projects Citect Pty Ltd 3 Fitzsimmons Lane Gordon NSW 2072 Australia www.citect.com DISCLAIMER Citect Pty. Limited makes no representations or warranties with respect to this
More informationLiving Specification Last Updated 4 May 2012
Living Specification Last Updated 4 May 2012 This Version: http://dvcs.worg/hg/encoding/raw-file/tip/overview.html Participate: Send feedback to whatwg@whatwg.org (archives) or file a bug (open bugs) IRC:
More informationBlaise Team IBUC, April 24, 2012
Blaise Team IBUC, April 24, 2012 Additions to Blaise since IBUC Baltimore Blaise 4.8.3 Support for non-western languages CATI extensions CAWI extensions CARI extensions... Blaise 4.8.4 Accessibility (Section
More informationPicsel epage. PowerPoint file format support
Picsel epage PowerPoint file format support Picsel PowerPoint File Format Support Page 2 Copyright Copyright Picsel 2002 Neither the whole nor any part of the information contained in, or the product described
More informationB.V. Patel Institute of BMC & IT, UTU 2014
BCA 3 rd Semester 030010301 - Java Programming Unit-1(Java Platform and Programming Elements) Q-1 Answer the following question in short. [1 Mark each] 1. Who is known as creator of JAVA? 2. Why do we
More informationC H A P T E R 1. Introduction to Computers and Programming
C H A P T E R 1 Introduction to Computers and Programming Topics Introduction Hardware and Software How Computers Store Data How a Program Works Using Python Computer Uses What do students use computers
More informationUnicode and its discontents. Jeremy G. Kahn Machine Translation reading group 5 May 2008
Unicode and its discontents Jeremy G. Kahn jgk@u.washington.edu Machine Translation reading group 5 May 2008 Overall outline Character encodings: Back to grammar school Vocabulary and history lessons Chinese
More informationLong Filename Specification
Long Filename Specification by vindaci fourth release First Release: November 18th, 1996 Last Update: January 6th, 1998 (Document readability update) Compatibility Long filename (here on forth referred
More informationUnicode and Standardized Notation. Anthony Aristar
Data Management and Archiving University of California at Santa Barbara, June 24-27, 2008 Unicode and Standardized Notation Anthony Aristar Once upon a time There were people who decided to invent computers.
More informationTECkit version 2.0 A Text Encoding Conversion toolkit
TECkit version 2.0 A Text Encoding Conversion toolkit Jonathan Kew SIL Non-Roman Script Initiative (NRSI) Abstract TECkit is a toolkit for encoding conversions. It offers a simple format for describing
More informationModel 2 is the recommended model and should normally be used. This bar code is printed using field data specified in a subsequent ^FD string.
126 ZPL Commands QR Code Bar Code Description The command produces a matrix symbology consisting of an array of nominally square modules arranged in an overall square pattern. A unique pattern at three
More informationPart III: Survey of Internet technologies
Part III: Survey of Internet technologies Content (e.g., HTML) kinds of objects we re moving around? References (e.g, URLs) how to talk about something not in hand? Protocols (e.g., HTTP) how do things
More informationUsing the "B" Switch to Add Unicode Characters to a QR Bar Code
Using the "B" Switch to Add Unicode Characters to a QR Bar Code The QR Bar Code is the only barcode in the Zebra implementation that supports encoding Unicode characters. The QR code specification does
More informationWeb Architecture Review Sheet
Erik Wilde (School of Information, UC Berkeley) INFO 190-02 (CCN 42509) Spring 2009 May 11, 2009 Available at http://dret.net/lectures/web-spring09/ Contents 1 Introduction 2 1.1 Setup.................................................
More informationChapter 4: Computer Codes. In this chapter you will learn about:
Ref. Page Slide 1/30 Learning Objectives In this chapter you will learn about: Computer data Computer codes: representation of data in binary Most commonly used computer codes Collating sequence Ref. Page
More informationLZ UTF8. LZ UTF8 is a practical text compression library and stream format designed with the following objectives and properties:
LZ UTF8 LZ UTF8 is a practical text compression library and stream format designed with the following objectives and properties: 1. Compress UTF 8 and 7 bit ASCII strings only. No support for arbitrary
More informationDC Detective. User Guide
DC Detective User Guide Version 5.7 Published: 2010 2010 AccessData Group, LLC. All Rights Reserved. The information contained in this document represents the current view of AccessData Group, LLC on the
More informationWordman s Production Corner
Wordman s Production Corner By Dick Eassom, AF.APMP Three Word Tricks...Fractions, Diacritics, and Gibberish The Problems The first trick was inspired by the Office Challenge in TechRepublic (http://www.techrepublic.com/):
More informationSupport for word-by-word, non-cursive handwriting
Decuma Latin 3.0 for SONY CLIÉ / PalmOS 5 Support for word-by-word, non-cursive handwriting developed by Decuma AB Copyright 2003 by Decuma AB. All rights reserved. Decuma is a trademark of Decuma AB in
More information10. Reference Types. Swap! Reference Types: Definition. Reference Types
Swap! 10. Reference Types Reference Types: Definition and Initialization, Call By Value, Call by Reference, Temporary Objects, Constants, Const-References 361 // POST: values of x and y are exchanged void
More informationAnnouncement. (CSC-3501) Lecture 3 (22 Jan 2008) Today, 1 st homework will be uploaded at our class website. Seung-Jong Park (Jay)
Computer Architecture (CSC-3501) Lecture 3 (22 Jan 2008) Seung-Jong Park (Jay) http://www.csc.lsu.edu/~sjpark 1 Announcement Today, 1 st homework will be uploaded at our class website Due date is the beginning
More informationPicsel epage. Word file format support
Picsel epage Word file format support Picsel Word File Format Support Page 2 Copyright Copyright Picsel 2002 Neither the whole nor any part of the information contained in, or the product described in,
More information