Internationalization and Localization with PHP
Pages: 1, 2
Message Objects
These phrases can also be stored as function return values instead of
strings in an array. Storing the phrases as functions removes the need to use
printf(). Functions that return a sentence look like this:
<?php
// English version
function i_am_X_years_old($age){
return "I am $age years old.";
}
// Spanish version
function i am_X_years_old($age){
return "Tengo $age años.";
}
?>
If some parts of the message catalog belong in an array, and some parts belong in functions, an object is a helpful container for a language's message catalog. A base object and two simple message catalogs look like this:
<?php
class pc_MC_Base {
var $messages;
var $lang;
function msg($s) {
if (isset($this->messages[$s])) {
return $this->messages[$s];
} else {
error_log("l10n error:LANG:" .
"$this->lang,message:'$s'");
}
}
}
class pc_MC_es_US extends pc_MC_Base {
function pc_MC_es_US() {
$this->lang ='es_US';
$this->messages = array(
'chicken' => 'pollo',
'cow' => 'vaca',
'horse' => 'caballo'
);
}
function i_am_X_years_old($age){
return "Tengo $age años";
}
}
class pc_MC_en_US extends pc_MC_Base {
function pc_MC_en_US() {
$this->lang ='en_US';
$this->messages = array(
'chicken' => 'chicken',
'cow' => 'cow',
'horse' => 'horse'
);
}
function i_am_X_years_old($age) {
return "I am $age years old.";
}
}
?>
Each message catalog object extends the pc_MC_Base class to get the
msg() method, and then defines its own messages (in its
constructor) and its own functions that return phrases. Here's how to print
text in Spanish:
<?php
$MC = new pc_MC_es_US;
print $MC->msg('cow');
print $MC->i_am_X_years_old(15);
?>
To print the same text in English, $MC just needs to be
instantiated as a pc_MC_en_US object instead of a pc_MC_es_US object. The rest
of the code remains unchanged.
Localizing Images
Images need to be localized when you want to display images containing text in locale-appropriate languages.
Make an image directory for each locale you want to support, as well as a
global image directory for images that have no locale-specific information.
Create copies of each locale-specific image in the appropriate directories.
Make sure that these images have the same filenames. Instead of printing image
URLs directly, use a wrapper method similar to the msg() method
demonstrated earlier.
The img() wrapper method looks for a locale-specific version
of an image first, then a global one. If neither are present, it logs an error
message. Building upon the pc_MC_Base class, the new class looks like this:
<?php
class pc_MC_Base {
var $messages;
var $images;
var $lang;
var $image_base_path = '/usr/local/www/images';
var $image_base_url = '/images';
function msg($s) {
if (isset($this->messages[$s])) {
return $this->messages[$s];
} else {
error_log("l10n error:LANG:" .
"$this->lang,message:'$s'");
}
}
function img($f) {
if (is_readable("$this->image_base_path/" .
"$this->lang/$f")) {
print "$this->image_base_url/$this->lang/$f";
} elseif (is_readable("$this->image_base_path/" .
"global/$f")) {
print "$this->image_base_url/global/$f";
} else {
error_log("l10n error:LANG:" .
"$this->lang,image:'$f'");
}
}
}
?>
The img() method needs to know both the path to the image file
in the filesystem ($image_base_path) and the path to the image
from the base URL of your site ($image_base_url). It uses the
first to test if the file can be read and the second to construct an
appropriate URL for the image.
A localized image must have the same filename in each localization
directory. For example, an image that says "New!" on a yellow starburst should
be called new.gif in both the images/en_US directory
and the images/es_US directory, even though the file
images/es_US/new.gif is a picture of a yellow starburst with the
word "¡Nuevo!" on it. Don't forget that the alt text you display in your image
tags also needs to be localized. A complete localized <img> tag
looks like this:
<?php
$MC = new pc_MC_es_US;
printf('<img src="%s" alt="%s">',
$MC->img('cancel.png'), $MC->msg('Cancel'));
?>
If the localized versions of a particular image have varied dimensions, store image height and width in the message catalog as well:
<?php
printf('<img src="%s" alt="%s" ' .
'height="%d" width="%d">',
$MC->img('cancel.png'), $MC->msg('Cancel'),
$MC->msg('img-cancel-height'),
$MC->msg('img-cancel-width'));
?>
The localized messages for img-cancel-height and img-cancel-width are not
text strings, but integers that describe the dimensions of the
cancel.png image in each locale.
If you use a consistent naming convention for your variable and file names,
create an imgsrc() method to simplify matters:
<?php
function imgsrc($img) {
$src = $this->img("$img.png");
$alt = $this->msg(ucfirst($img));
$height = $this->msg("img-$src-height");
$width = $this->msg("img-$src-width");
return sprintf('<img src="%s" alt="%s" ' .
'height="%d" width="%d">',
$src, $alt, $height, $width);
}
?>
To get the same results as the Cancel button example before, call it like this:
<?php
$MC = new pc_MC_es_US;
print $MC->imgsrc('cancel');
?>
Conclusion
With help of the msg() and img() methods, you can
quickly create message objects that allow you to localize your Web site using
100 percent pure PHP. Because it's an all-PHP solution, you can reuse all your
existing code, and you don't need to install any new extensions. However, if you
need to share message catalogs among many applications, PHP supports gettext.
See Joao Prado
Maia's article for more details on using gettext with PHP.
As you can see, internationalizing your PHP applications is not a labor of Hercules. When you organize your localizations within an object hierarchy, it's easy to extend your classes to support new countries and regions without difficulties.
Adam Trachtenberg is the manager of technical evangelism for eBay and is the author of two O'Reilly books, "Upgrading to PHP 5" and "PHP Cookbook." In February he will be speaking at Web Services Edge 2005 on "Developing E-Commerce Applications with Web Services" and at the O'Reilly booth at LinuxWorld on "Writing eBay Web Services Applications with PHP 5."
Return to the PHP DevCenter.
You must be logged in to the O'Reilly Network to post a talkback.
Showing messages 1 through 7 of 7.
-
Non-technical aspects of i18n
2002-12-20 13:45:15 Adam Trachtenberg |
[Reply | View]
You're completely right about separating text from code. I'd never work on a project that mixed the two. (And I have developed multi-lingual web sites. The code here is based on "real life;" it wasn't written specifically for PHP Cookbook.)
Unfortunately, the internationalization process is extremely complex. If you try to cover everything in one article, it'll end up running on for ever. In this article, I decided to focus on the technical aspects. In order not to let the other details obscure my explanation of the code, I was forced to simplify other parts.
I've written up some other thoughts how to best organize files and handle the actual process of getting the text into the system. I posted it on my web log. Check them out and please add comments and suggestions.
PS: To the person who felt since I chose not to use Greek and Japanese in the article is an indication of a deficit in LAMP, you're wrong. Non-Western characters don't always render correctly in all browsers. Since if people can't read the article, it's not very helpful, I made the choice to stick to English and Spanish. You can read more on PHP's Multi-byte string support in the manual. -
Non-technical aspects of i18n
2007-07-22 08:14:03 TheCaveman [Reply | View]
Do you have another URL to try? It is quite a while later now and your blog is still inaccessible even after signing up for the O'Reilly Network. Thanks. -
Non-technical aspects of i18n
2005-11-21 08:45:26 StarsInTheSky [Reply | View]
Hello, I cannot access your blog with the given link, even after creating and logging into oreilly, it is restricted access. Would it be possible with a public link? (instead of http://www.oreillynet.com/cs/weblog/view/wlg/2470)
-
even more mistaken that that...
2002-12-11 17:56:42 anonymous2 [Reply | View]
The article starts off talking about the problem of mixing French, Greek, and Japanese, then proceeds to demonstrate how to mix English with English, then for the advanced class, how to mix English and Spanish, and even that is done by hard coding strings into the source code.
The LAMP platform is about the closest you can get to worst case scenario for anyone who does real internationalization for a living.
What the author should have said was that if you need to create an app for a small organization with no international aspirations, LAMP may acceptable, but otherwise you'd need to change to more professional tools like Java, .Net, Oracle, PostgreSQL, SQLServer, etc.
-
oh, are you mistaken...
2002-12-11 02:48:13 kirkmc [Reply | View]
As a translator, I shudder when I read articles like this. It is the type of short-sighted attitude that makes our work so hard.
As a rule, you should _never_ put strings of text that are to be localized in code of any kind. Not only does it require the translator to work in the code, taking far longer than necessary to translate simple texts, but it runs the risk of the code getting damaged in this translation processe. While some translators know enough about code to work in it, most don't. The last thing you want is for your translated code to come back and find it doesn't work. Or not find it doesn't work, until you discover your site is malfunctioning.
The best way to work with localization is to have separate files for each language, which may contain variables and definition, but which won't contain any real code. Each translator can then work on their language file, which you just roll back into your database.
Kirk -
Wrong perspective
2003-03-28 13:18:08 Chris Shiflett |
[Reply | View]
It's quite short-sighted to try to launch a semantic argument like this against the merits of the article. Your logic seems to rely completely on whether this article is a tutorial or a real-world example. It is a tutorial, even though it may be based on a real-world example.
For example, take this:
$messages = array (
'en_US'=> array(
'My favorite foods are' =>
'My favorite foods are',
'french fries' => 'french fries',
'biscuit' => 'biscuit',
'candy' => 'candy',
'potato chips' => 'potato chips',
'cookie' => 'cookie',
'corn' => 'corn',
'eggplant' => 'eggplant'
),
'en_GB'=> array(
'My favorite foods are' =>
'My favourite foods are',
'french fries' => 'chips',
'biscuit' => 'scone',
'candy' => 'sweets',
'potato chips' => 'crisps',
'cookie' => 'biscuit',
'corn' => 'maize',
'eggplant' => 'aubergine'
)
);
This is just example code to create the array that he demonstrates how to use. This array doesn't have to be hard-coded, but this article's scope isn't about creating a friendly interface for translators, it's about internationalization for programmers. It is trivial to make a nice little Web application that provides a friendly data entry interface for translators that stores the information in a database. This database can be used to create the array.
There are plenty of tutorials that demonstrate how to interact with a database. This one is about internationalization and localization.







So, I created the standard gettext() directory structure:
/locale/<LANGUAGE_STRING>/LC_MESSAGE
Under LC_message I created pipe-delimited files called messages.txt with the word or phrase pairings. For example:
// computer terms
computer|computadora
hard drive|disco duro
monitor|monitor
I then wrote a PHP script to parse this into an array of the format suggested in the article and put it in a file called "locale.inc.php" that I include in my top-level include file. (I didn't want to have to parse the file every time I needed a translation for performance reasons) The script is designed to run as a cron job so any changes to the text files are reflected in the array. Here is the script:
<?php
/* $Id$
*
* Author: Ken Riley <kenr@nodots.com>
* Copyright (C) 2004 Nodots Development, Inc. All Rights Reserved.
* <http://www.nodots.com/>
*
* Description: Reads pseudo-i18n files and generates arrays. Designed to run
* as a cron process.
*/
include('include/GLOBALS.inc.php');
$dh = dir(LOCALE_DIR);
$outFile = BASE_DIR."/include/locale.php";
$localeFile = fopen($outFile,"w+");
writeHeader($localeFile);
$languageCt = 0;
while (false !== ($file = $dh->read())) {
if ($file != "." && $file != "..") {
$languageFile = LOCALE_DIR."/".$file."/LC_MESSAGES/messages.txt";
fwrite($localeFile,"\t'$file'=> array(\n");
$fh = fopen($languageFile,"r");
$s = "";
while ( feof($fh) === false ) {
$line = chop(fgets($fh));
if ( substr($line,0,2) != "//" and $line != "" ) {
$translation = explode("|",$line);
$english = $translation[0];
$translated = $translation[1];
$s .= "\t\t'$english' => '$translated',\n";
}
}
$strSize = sizeof($s);
$strSize = $strSize - 2;
$cleanS = substr($s,0,$strSize);
$cleanS .= "\n\t),";
fwrite($localeFile,$cleanS);
$languageCt++;
fclose($fh);
}
}
writeFooter($localeFile);
$dh->close();
function writeHeader($fh) {
$timestamp = date("d-m-Y G:i:s");
$headerString = "<?php\n";
$headerString .= "// created ".$timestamp." by buildLanguageFile.php\n\n";
$headerString .= "\$messages = array (\n";
fwrite($fh,$headerString);
}
function writeFooter($fh) {
$footerString = ");\n";
$footerString .= "?>";
fwrite($fh,$footerString);
}
?>
I then added the author's msg($s) function into my top-level include file. If I have a client with access to gettext(), I just modify that function to use gettext() rather than the array.
So, you have simple text files for the translator, a process for automatically slurping those into an array, and a clear path to full-fledged i18n support in your ap.
Provecho,
Ken Riley
Nodots Development, Inc.