It should be simple. Our web site currently is translated into only a handful of languages, but we allow the visitor to pick from many different locales when viewing the site so that currencies, prices, dates and other such locale specific things can be displayed for the correct country and language. The idea is that we localize for most countries, but only translate into the few languages. We have had the problem solved and have implemented many web applications using this scheme, but for some reason we keep hitting problems and coming back to the issue. What is so hard about localization? Is it that we have the theory wrong, that our developers just need better training on the standards, or that localization is inherently hard? Let’s explore.
Internationalization is the process of designing software applications so that they can be adapted to various different cultures, regions and languages without engineering changes.
Localization is the process of adapting software for a specific culture, region or language by adding locale-specific meta-data and translating text.
Globalization is the process of doing both Internationalization and Localization so that the application has as wide of a potential audience as possible.
Translation is the process of translating application text into different languages.
As software designers we are interested in the process of internalization which is all about making the software application easy to localize. And this means much more than just saving and using some translation text. Here is an example:
American English: HURRY!!!! Your price of $10.00 expires on 1/18/2009 and we are quickly running out of your favorite color!
British English: Hurry! Your price of £6.00 expires on 18/1/2009 and we are quickly running out of your favourite colour.
Despite the fact that both of texts are in English there are several differences to consider:
- $10.00 vs £6.00 – There is both a difference in the currency sign and the amount. In some cases a conversion rate could be used or in other cases special pricing be give by country or customer account. In other currencies the number of decimals may vary and the currency sign may be on the other side of the amount.
- 1/18/2009 v.s. 18/1/2009 – the American date format has the month first, followed by the day. Many European countries switch this order.
- “favorite” v.s. “favourite” – Many British words are spelled differently than their American counterparts.
- “HURRY!!!!” vs “Hurry!” – Americans require more excitement, let’s add a few more exclamation points. (What can I say!!!)
Because of the dynamic nature of pricing and expiration dates we don’t want to translate the text for all possible prices and dates, instead we want some kind of templating mechanism. There are several ways to create templates for text (or markup such as html) which we may explore in a different post, but here we will use substitution tags embedded in the translated text shown in brackets {}.
American English: HURRY!!!! Your price of {price} expires on {date} and we are quickly running out of your favorite color!
British English: Hurry! Your price of {price} expires on {date} and we are quickly running out of your favourite colour.
Now we can get the price and expiration date from a pricing system, perform the localization using standard APIs and then render the completed text. In Java there are several internationalization APIs to help us out. First, we must distinguish between American and British English using a java locale. Here we create a locale for American English and grab the country from the locale:
Locale locale = new Locale("en","US"); // American English
String shipCountry = locale.getCountry();
Next we get the pricing information from a pricing system. Pricing systems may take in various parameters, but here we assume price is based on customer, product and the country that the product will be shipped to. Lets assume the pricing system does a lookup on these three parameters in a table and returns an object containing price, currency, and the date that the price expires. It is safest to let the pricing system return the currency rather than depending on another localization API so we are sure that the amount matches the currency.
// get price information from our custom pricing system
PriceData priceData = getPrice(customerId, productId, shipCountry); Float price = priceData.price; Currency currency = priceData.currency; Date expiration = priceData.expiration;
Now we can use the DateFormat class to format the date to conform to American English. The SHORT format selects the numeric only data format (e.g. 1/18/2007).
DateFormat dateFormatter =
DateFormat.getDateInstance(DateFormat.SHORT, locale); String dateText = dateFormatter.format(expiration);
And here we use the NumberFormat class to convert the price into a string. Note that the currency formatter for numbers in java takes in a locale and it chooses the currency based on that locale, but our pricing routine also chose the currency based on its own rules. The pricing routine must take precedence since the pricing routine also returns the amount, so in this simple example we throw an exception if they are not equal.
NumberFormat currencyFormatter = NumberFormat.getCurrencyInstance(locale);
String priceText = currencyFormatter.format(price);
if ( !currencyFormatter.getCurrency().equals(priceData.currency) ) {
// This shouldn't happen
throw new RuntimeException ("Currency missmatch.");
}
Then finally we apply the localized template. Assume that the getTemplate() method returns the template text based on the locale. Java provides a nice template substitution and localization facility in the MessageFormat class, but it requires that we change the template text to use “{0}” and “{1}” instead of “{price}” and “{date}”.
String template = getTemplate(locale);
// ex: Hurry!!!! Your price of {0} expires on {1} and we are quickly running out of your favorite color!
String message = MessageFormat.format(template,priceText,dateText);
System.out.println(message);
The result is that the proper localized text is printed. One final challenge is the possible mismatch between the locale and the customer’s actual shipping address. The customer may have chosen “en-US” as the locale so that the text would display in English, but they actually intend to ship the product to Mexico where it will be put into use. The reason this is an issue is that we have specified that pricing is to be based on shipping address rather than the choice of locale (this is often the case for many pricing systems). The pricing functionality must match the localization or there is a danger of prices being quoted incorrectly. We may not actually know the shipping address until the customer goes to checkout and purchase the product. At checkout the price would need to be recalculated and displayed for the correct country which in our example is Mexico. This means re-executing the getPrice() method with the new shipping address country and applying the currency formatter for English but in Mexican Pesos:
// At checkout customer chooses a Mexican shipping address
shipCountry = "MX";
Locale shipLocale = new Locale("en", shipCountry);
priceData = getPrice(customerId, productId, shipCountry);
currencyFormatter = NumberFormat.getCurrencyInstance(shipLocale);
priceText = currencyFormatter.format(priceData.price);
System.out.println(priceText);
This would output the following text:
HURRY!!!! Your price of MXN99 expires on 1/18/2009 and we are quickly running out of your favorite color!
The text is localized in English, and the amount, while displayed in an English format is quoted in Mexican Pesos. We need the currency formatter to follow these rules so we have done this by creating a hybrid “en-MX” locale which seems to work fine in this case. The unfortunate issue is that the currency formatter in Java mixes the concept of locale with the concept of currency. This code would need to be properly tested for all currencies used.
One common technique for localization that we have seen in this example is the separation of the translation of text from the locale based formatting of variable data. In practice this is a good idea since it cuts down on the amount of text that will have to be translated into different languages. In the example we were able to display a price for Mexico in Pesos without having to translate the text into Spanish. This means that we can marginally serve Mexican customers even before we have the Spanish translations in place, something which is quite important for a global company that has customers in many countries. Also, in practice we may not want to take the expense of translating into both American and British English since the following text is acceptable to British English speakers:
English: HURRY!!!! Your price of £6.00 expires on Jan 18, 2009 and we are quickly running out of your favorite color!
Notice one change here, however. The date format has been fixed to eliminate the possibility of ambiguity. Generally speaking the all numeric date format dd/mm/yyyy (DateFormat.SHORT in Java) is a bad idea. Most systems will use some format with the month spelled out and of course nobody displays only 2 characters in the year any more.
In Part II we will cover more on how to manage the translated text.