The process of preparing an application to support more than one language and data format is called internationalization. Localization is the process

Similar documents
The Unicode Standard Version 11.0 Core Specification

Can R Speak Your Language?

Database. Request Class. jdbc. Servlet. Result Bean. Response JSP. JSP and Servlets. A Comprehensive Study. Mahesh P. Matha

Will Ballard and Elizabeth Bales, SAS Institute Inc.

The Reality of Web Encoding Identification, or Lack Thereof

HTTP Requests and Header Settings

Unicode Support. Chapter 2:

PRIMIX SOLUTIONS. Core Labs. Tapestry : Java Web Components Whitepaper

Google Search Appliance

Part III: Survey of Internet technologies

Enabling Multilingualism and I18N in DSpace

JSP MOCK TEST JSP MOCK TEST IV

Digital Representation

Princeton University. Computer Science 217: Introduction to Programming Systems. Data Types in C

Module 3 Web Component

SESM Components and Techniques

XDS An Extensible Structure for Trustworthy Document Content Verification Simon Wiseman CTO Deep- Secure 3 rd June 2013

Orbix Internationalization Guide

web.xml Deployment Descriptor Elements

Beyond Base 10: Non-decimal Based Number Systems

SERVLETS - INTERNATIONALIZATION

Internationalization Guide. Version 6.2, December 2004

TECkit version 2.0 A Text Encoding Conversion toolkit

The Unicode Standard Version 12.0 Core Specification

About the Authors. Who Should Read This Book. How This Book Is Organized

Representing Characters, Strings and Text

Developing Portlets for SAS Information Delivery Portal 4.4

Casabac Unicode Support

Beyond Base 10: Non-decimal Based Number Systems

Localizing and Customizing JavaServer Pages

COM Text User Manual

XML 2 APPLICATION. Chapter SYS-ED/ COMPUTER EDUCATION TECHNIQUES, INC.

1 CUSTOM TAG FUNDAMENTALS PREFACE... xiii. ACKNOWLEDGMENTS... xix. Using Custom Tags The JSP File 5. Defining Custom Tags The TLD 6

XML 2 APPLICATION. Chapter SYS-ED/ COMPUTER EDUCATION TECHNIQUES, INC.

XML and XSLT. XML and XSLT 10 February

Internationalization Guide. Version 6.1, December 2003

Distributed Multitiered Application

WEB TECHNOLOGIES CHAPTER 1

Localizing Intellicus. Version: 7.3

Cindex 3.0 for Windows. Release Notes

Representing text on the computer: ASCII, Unicode, and UTF 8

CS-201 Introduction to Programming with Java

Session 9. Deployment Descriptor Http. Reading and Reference. en.wikipedia.org/wiki/http. en.wikipedia.org/wiki/list_of_http_headers

Session 8. Reading and Reference. en.wikipedia.org/wiki/list_of_http_headers. en.wikipedia.org/wiki/http_status_codes

Lecture 25: Internationalization. UI Hall of Fame or Shame? Today s Topics. Internationalization Design challenges Implementation techniques

Chapter 4: Computer Codes. In this chapter you will learn about:


MULTINATIONALIZATION FOR GLOBAL LIMS DEPLOYMENT LABVANTAGE Solutions, Inc. All Rights Reserved.

Variables and Values

Extended Character Sets for UCAS Systems

ENGINEERING COMMITTEE Digital Video Subcommittee

Review of HTML. Ch. 1

QR Code Specification for Payment Systems (EMV QRCPS)

3 The Building Blocks: Data Types, Literals, and Variables

SAS Web Infrastructure Kit 1.0. Developer s Guide

Rocket UniVerse. NLS Guide. Version November 2013 UNV-112-NLS-1

112. Introduction to JSP

St. Benedict s High School. Computing Science. Software Design & Development. (Part 2 Computer Architecture) National 5

Internationalization of uportal Overview of Internationalization & Localization

Best Practices for Globalization using the Oracle 9i Internet Application Server

Regular Expressions. Todd Kelley CST8207 Todd Kelley 1

Java4340r: Review. R.G. (Dick) Baldwin. 1 Table of Contents. 2 Preface

Introduction to Informatics

112-WL. Introduction to JSP with WebLogic

Oracle Developer Day

Oracle Enterprise Data Quality for Product Data

PES INSTITUTE OF TECHNOLOGY, SOUTH CAMPUS DEPARTMENT OF MCA INTERNAL TEST (SCHEME AND SOLUTION) II

2011 Martin v. Löwis. Data-centric XML. Character Sets

Course Schedule. CS 221 Computer Architecture. Week 3: Plan. I. Hexadecimals and Character Representations. Hexadecimal Representation

2007 Martin v. Löwis. Data-centric XML. Character Sets

Internet Engineering Task Force (IETF) Request for Comments: 5987 Category: Standards Track August 2010 ISSN:

Introduction 1. Chapter 1

UTF and Turkish. İstinye University. Representing Text

Navigate the Admin portal

Internet Streaming Media Alliance Hyperlinked Video Specification Version 1.0 September 2006

2/12/17. Goals of this Lecture. Historical context Princeton University Computer Science 217: Introduction to Programming Systems

2. Introduction to Internet Applications

Chapter 7. Representing Information Digitally

Writing Servlets and JSPs p. 1 Writing a Servlet p. 1 Writing a JSP p. 7 Compiling a Servlet p. 10 Packaging Servlets and JSPs p.

JAVA. Duration: 2 Months

Variables, Constants, and Data Types

Digital Imaging and Communications in Medicine (DICOM) Supplement 174: RESTful Rendering

Chapter 10 Servlets and Java Server Pages

Verity Locale Configuration Guide

Representing Characters and Text

LBSC 690: Information Technology Lecture 05 Structured data and databases

Compatibility matrix: ServiceCenter 6.2

1 Copyright 2011, Oracle and/or its affiliates. All rights reserved.

Princeton University Computer Science 217: Introduction to Programming Systems The C Programming Language Part 1

IT 1204 Section 2.0. Data Representation and Arithmetic. 2009, University of Colombo School of Computing 1

An Introduction to the Internationalisation of econtent

Digital Logic. The Binary System is a way of writing numbers using only the digits 0 and 1. This is the method used by the (digital) computer.

C How to Program, 6/e by Pearson Education, Inc. All Rights Reserved.

Type of Classes Nested Classes Inner Classes Local and Anonymous Inner Classes

Multilingual vi Clones: Past, Now and the Future

Lecture Notes. System.out.println( Circle radius: + radius + area: + area); radius radius area area value

CS144: Content Encoding

SAS Web Infrastructure Kit 1.0. Developer s Guide, Fifth Edition

Full file at

*North American Standard USB Keyboard

Transcription:

1

The process of preparing an application to support more than one language and data format is called internationalization. Localization is the process of adapting an internationalized application to support a specific region or locale. Examples of locale-dependent information include messages and user interface labels, character sets and encoding, and date and currency formats. Although all client user interfaces should be internationalized and localized, it is particularly important for web applications because of the global nature of the web. 2

3

In the Java 2 platform, java.util.locale represents a specific geographical, political, or cultural region. The string representation of a locale consists of the international standard two-character abbreviation for language and country and an optional variant, all separated by underscore (_) characters. Examples of locale strings include fr (French), de_ch (Swiss German), and en_us_posix (English on a POSIX-compliant platform). Locale-sensitive data is stored in a java.util.resourcebundle. A resource bundle contains key-value pairs, where the keys uniquely identify a locale-specific object in the bundle. A resource bundle can be backed by a text file (properties resource bundle) or a class (list resource bundle) containing the pairs. You construct resource bundle instance by appending a locale string representation to a base name. 4

5

Messages and labels should be tailored according to the conventions of a user s language and region. There are two approaches to providing localized messages and labels in a web application: Provide a version of the JSP page in each of the target locales and have a controller servlet dispatch the request to the appropriate page depending on the requested locale. This approach is useful if large amounts of data on a page or an entire web application need to be internationalized. Isolate any locale-sensitive data on a page into resource bundles, and access the data so that the corresponding translated message is fetched automatically and inserted into the page. Thus, instead of creating strings directly in your code, you create a resource bundle that contains translations and read the translations from that bundle using the corresponding key. 6

To get the correct strings for a given user, a web application either retrieves the locale (set by a browser language preference) from the request using the getlocale method, or allows the user to explicitly select the locale. After the locale is set, the controller of a web application typically retrieves the resource bundle for that locale and saves it as a session attribute (see Associating Objects with a Session) for use by other components: When a session is initiated, the resource bundle for the user s locale is stored in the localization context. It is also possible to override the resource bundle at runtime for a given scope using the fmt:setbundle tag and for a tag body using the fmt:bundle tag. 7

The JSP versions of the Duke s Bookstore application uses the fmt:message tag to provide localized strings for messages, HTML link text, button labels, and error messages: Messages that are not queued on a component and are therefore not loaded automatically are referenced using a value expression. You can reference a localized message from almost any JavaServer Faces tag attribute. The value expression that references a message has the same notation whether you loaded the resource bundle with the loadbundle tag or registered it with the resource-bundle element in the configuration file. The value expression notation is var.message, in which var matches the var attribute of the loadbundle tag or the var element defined in the resource-bundle element of the configuration file, and message matches the key of the message contained in the resource bundle, referred to by the var attribute. 8

9

Java programs use the DateFormat.getDateInstance(int, locale) to parse and format dates in a locale-sensitive manner. Java programs use the NumberFormat.getXXXInstance(locale) method, where XXX can be Currency, Number, or Percent, to parse and format numerical values in a locale-sensitive manner. The servlet version of Duke s Bookstore uses the currency version of this method to format book prices. JSTL applications use the fmt:formatdate and fmt:parsedate tags to handle localized dates and use the fmt:formatnumber and fmt:parsenumber tags to handle localized numbers, including currency values. For information on the JSTL formatting tags, see Formatting Tags. The JSTL version of Duke s bookstore uses the fmt:formatnumber tag to format book prices and the fmt:formatdate tag to format the ship date for an order: 10

The following sections describe character sets and character encodings. 11

A character set is a set of textual and graphic symbols, each of which is mapped to a set of nonnegative integers. The first character set used in computing was US-ASCII. It is limited in that it can represent only American English. US-ASCII contains uppercase and lowercase Latin alphabets, numerals, punctuation, a set of control codes, and a few miscellaneous symbols. Unicode defines a standardized, universal character set that can be extended to accommodate additions. When the Java program source file encoding doesn t support Unicode, you can represent Unicode characters as escape sequences by using the notation \uxxxx, where XXXX is the character s 16-bit representation in hexadecimal. For example, the Spanish version of the Duke s Bookstore message file uses Unicode for non-ascii characters: A character encoding maps a character set to units of a specific width and defines byte serialization and ordering rules. Many character sets have more than one encoding. For example, Java programs can represent Japanese character sets using the EUC-JP or Shift-JIS encodings, among others. Each encoding has rules for representing and serializing a character set. The ISO 8859 series defines 13 character encodings that can represent texts in dozens of languages. Each ISO 8859 character encoding can have up to 256 characters. ISO-8859-1 (Latin-1) comprises the ASCII character set, characters with diacritics (accents, diaereses, cedillas, circumflexes, and so on), and additional symbols. UTF-8 (Unicode Transformation Format, 8-bit form) is a variable-width character encoding that encodes 16-bit Unicode characters as one to four bytes. A byte in UTF-8 is equivalent to 7-bit ASCII if its high-order bit is zero; otherwise, the character comprises a variable number of bytes. UTF-8 is compatible with the majority of existing web content and provides access to the Unicode character set. Current versions of browsers and email clients support UTF-8. In addition, many new web standards specify UTF-8 as their character encoding. For example, UTF-8 is one of the two required encodings for XML documents (the other is UTF-16). Web components usually use PrintWriter to produce responses; PrintWriter automatically encodes using ISO-8859-1. Servlets can also output binary data using OutputStream classes, which perform no encoding. An application that uses a character set that cannot use the default encoding must explicitly set a different encoding. For web components, three encodings must be considered 12

The request encoding is the character encoding in which parameters in an incoming request are interpreted. Currently, many browsers do not send a request encoding qualifier with the Content-Type header. In such cases, a web container will use the default encoding, ISO-8859-1, to parse request data. If the client hasn t set character encoding and the request data is encoded with a different encoding from the default, the data won t be interpreted correctly. To remedy this situation, you can use the ServletRequest.setCharacterEncoding(String enc) method to override the character encoding supplied by the container. To control the request encoding from JSP pages, you can use the JSTL fmt:requestencoding tag. You must call the method or tag before parsing any request parameters or reading any input from the request. Calling the method or tag once data has been read will not affect the encoding. 13

For JSP pages, the page encoding is the character encoding in which the file is encoded. For JSP pages in standard syntax, the page encoding is determined from the following sources: 1. The page encoding value of a JSP property group (see Setting Properties for Groups of JSP Pages) whose URL pattern matches the page. 2. The pageencoding attribute of the page directive of the page. It is a translationtime error to name different encodings in the pageencoding attribute of the page directive of a JSP page and in a JSP property group. 3. The CHARSET value of the contenttype attribute of the page directive. If none of these is provided, ISO-8859-1 is used as the default page encoding. For JSP pages in XML syntax (JSP documents), the page encoding is determined as described in section 4.3.3 and appendix F.1 of the XML specification. The pageencoding and contenttype attributes determine the page character encoding of only the file that physically contains the page directive. A web container raises a translation-time error if an unsupported page encoding is specified. 14

The response encoding is the character encoding of the textual response generated by a web component. The response encoding must be set appropriately so that the characters are rendered correctly for a given locale. A web container sets an initial response encoding for a JSP page from the following sources: 1. The CHARSET value of the contenttype attribute of the page directive 2. The encoding specified by the pageencoding attribute of the page directive 3. The page encoding value of a JSP property group whose URL pattern matches the page If none of these is provided, ISO-8859-1 is used as the default response encoding. The javax.servlet.servletresponse.setcharacterencoding, javax.servlet.servletresponse.setcontenttype, and javax.servlet.servletresponse.setlocale methods can be called repeatedly to change the character encoding. Calls made after the servlet response s getwriter method has been called or after the response is committed have no effect on the character encoding. Data is sent to the response stream on buffer flushes (for buffered pages) or on encountering the first content on unbuffered pages. Calls to setcontenttype set the character encoding only if the given content type string provides a value for the charset attribute. Calls to setlocale set the character encoding only if neither setcharacterencoding nor setcontenttype has set the character encoding before. To control the response encoding from JSP pages, you can use the JSTL fmt.setlocale tag. 15

16