Localisation

From MediaWiki.org
(Redirected from Localization)
Jump to: navigation, search
i18n docs LocalisationLocalisation · System messageHelp:System message · Messages APIManual:Messages API · LanguageManual:Language · translatewiki.nettranslatewiki.net · Writing systemsWriting systems · DirectionalityDirectionality support
shortcuts:
I18N
I18n
L10n
For the Wikimedia Foundation localisation team, see Wikimedia Language engineering.
For translating pages on this wiki, see Project:Language policy.

This page gives a technical description of MediaWiki's internationalisation and localisation (i18n and L10n) system, and gives hints that coders should be aware of. Our mantra is that i18n must not be an afterthought: it's an essential component since the earliest phases of your software, as well as one of the core principles of MediaWiki.

Contents

Translation resources

translatewiki.net

translatewiki.net supports in-wiki translation of all the messages of core MediaWiki and of the extensions. If you would like to have nothing to do with all the technicalities in this page about editing files, Git, creating patches, and so forth, go directly to translatewiki.net.

All translation of MediaWiki user interface messages should go through translatewiki.net and not committed directly to code. Only the English messages and their initial documentation must be done in the source code.

Core MediaWiki and extensions must use system messages for any text displayed in the user interface. For an example of how to do this, please see Manual:Special pages. If the extension is well written, it will probably be included in translatewiki.net in a few days, after its staff notices it on gerrit. If it's not noticed, contact them. If it's too unstable to be translated, note so in the code or commit and contact them if necessary.

See also Overview of the localisation system and What can be localised.

Finding messages

Help:System message explains how to find a particular string you want to translate. In particular, note the qqx trick, which was introduced in MediaWiki 1.18.

i18n mailing list

You can subscribe to the i18n list. At the moment it is low-traffic.

Code structure

First, you have a Language object in Language.php. This object contains all the localisable message strings, as well as other important language-specific settings and custom behaviour (uppercasing, lowercasing, printing dates, formatting numbers, direction, custom grammar rules etc.).

The object is constructed from two sources: sub-classed versions of itself (classes) and Message files (messages).

There's also the MessageCache class, which handles input of text via the MediaWiki namespace. Most internationalisation is nowadays done via Message objects and by using the wfMessage() shortcut function, which is defined in includes/GlobalFunctions.php. Legacy code might still be using the old wfMsg*() functions, which are now considered deprecated in favour of the above-mentioned Message objects.

General use (for developers)

See also Manual:Messages API.

Language objects

There are two ways to get a language object. You can use the globals $wgLang and $wgContLang for user interface and content language respectively. For an arbitrary language you can construct an object by using Language::factory( 'en' ), by replacing en with the code of the language. You can also use wfGetLangObj( $code ); if $code could already be a language object. The list of codes is in languages/Names.php.

Language objects are needed for doing language-specific functions, most often to do number, time and date formatting, but also to construct lists and other things. There are multiple layers of caching and merging with fallback languages, but the details are irrelevant in normal use.

Using messages

MediaWiki uses a central repository of messages which are referenced by keys in the code. This is different from, for example, Gettext, which just extracts the translatable strings from the source files. The key-based system makes some things easier, like refining the original texts and tracking changes to messages. The drawback is of course that the list of used messages and the list of source texts for those keys can get out of sync. In practice this isn't a big problem, and the only significant problem is that sometimes extra messages that are not used anymore still stay up for translation.

To make message keys more manageable and easy to find, also with grep, always write them completely and don't rely too much on creating them dynamically. You may concatenate parts of message keys if you feel that it gives your code better structure, but put a comment nearby with a list of the possible resulting keys. For example:

// Messages that can be used here:
// * myextension-connection-success
// * myextension-connection-warning
// * myextension-connection-error
$text = wfMessage( 'myextension-connection-' . $status )->parse();

The detailed use of message functions in PHP and JavaScript is on Manual:Messages API.

Adding new messages

See also: Localisation file format
  1. Decide a name (key) for the message. Try to follow global or local conventions for naming. For extensions, use a standard prefix, preferably the extension name in lower case, followed by a hyphen ("-"). Stick to lower case letters, numbers and dashes in message names; most others are between less practical or not working at all. See also Manual:Coding conventions#System messages.
  2. Make sure that you are using suitable handling for the message (parsing, {{-replacement, escaping for HTML, etc.)
    • If your message is part of core add it to languages/i18n/en.json
    • If your message is in an extension add it to the i18n/en.json file or the en.json file in the appropriate subdirectory. If an extensions has a lot of messages, you may create subdirectories under i18n and list them in the $wgMessagesDirs variable.
  3. Take a pause and consider the wording of the message. Is it as clear as possible? Can it be misunderstood? Ask for comments from other developers or localisers if possible. Follow the #internationalization hints.
  4. Add documentation to qqq.json in the same directory. Read more about message documentation.

Messages that should not be translated

  1. Ignored messages are those which should exist only in the English messages file. They are messages that should not need translation, because they reference only other messages or language-neutral features, e.g. a message of '{{SITENAME}}'.
  2. Optional messages may be translated only if changed in the target language.

To flag such messages:

Removing existing messages

Remove it from en.json and qqq.json. Don't bother with other languages. Updates from translatewiki.net will handle those automatically.

Changing existing messages

  1. Consider updating the message documentation (see Adding new messages).
  2. Change the message key if old translations are not suitable for the new meaning. This also includes changes in message handling (parsing, escaping, parameters, etc.). Improving the phrasing of a message without technical changes is usually not a reason for changing a key. At translatewiki.net, the translations will be marked as outdated so that they can be targeted by translators. If in doubt, ask in #mediawiki-i18n or in the support page at translatewiki.net.
  3. If the extension is supported by translatewiki.net, please only change the English source message and/or key, and the accompanying entry in qqq.json. If needed, the translatewiki.net team will take care of updating the translations, marking them as outdated, cleaning up the file or renaming keys where possible. This also applies when you're only changing things like HTML tags which you could change in other languages without speaking those languages. Most of these actions will take place in translatewiki.net and will reach Git with about one day of delay.

Localising namespaces and special page aliases

Namespaces and special page names (i.e. 'RecentChanges' in 'Special:RecentChanges') are also translatable.

Namespaces

Currently[1] making namespace name translations is disabled on translatewiki.net, so you need to do this yourself in Gerrit, or file a Phabricator: task asking for someone else to do it.

To allow custom namespaces introduced by your extension to be translated, create a MyExtension.namespaces.php file that looks like this:

<?php
/**
 * Translations of the namespaces introduced by MyExtension.
 *
 * @file
 */

$namespaceNames = array();

// For wikis where the MyExtension extension is not installed.
if( !defined( 'NS_MYEXTENSION' ) ) {
	define( 'NS_MYEXTENSION', 2510 );
}

if( !defined( 'NS_MYEXTENSION_TALK' ) ) {
	define( 'NS_MYEXTENSION_TALK', 2511 );
}

/** English */
$namespaceNames['en'] = array(
	NS_MYEXTENSION => 'MyNamespace',
	NS_MYEXTENSION_TALK => 'MyNamespace_talk',
);

/** Finnish (Suomi) */
$namespaceNames['fi'] = array(
	NS_MYEXTENSION => 'Nimiavaruuteni',
	NS_MYEXTENSION_TALK => 'Keskustelu_nimiavaruudestani',
);

Then load the namespace translation file in MyExtension.php via $wgExtensionMessagesFiles['MyExtensionNamespaces'] = dirname( __FILE__ ) . '/MyExtension.namespaces.php';

Now, when a user installs MyExtension on their Finnish (fi) wiki, the custom namespace will be translated into Finnish magically, and the user doesn't need to do a thing!

Also remember to register your extension's namespace(s) on the extension default namespaces page.

Special page aliases

See the manual page for Special pages for up-to-date information. The following does not appear to be valid.

Create a new file for the special page aliases in this format:

<?php
/**
 * Aliases for the MyExtension extension.
 *
 * @file
 * @ingroup Extensions
 */

$aliases = array();

/** English */
$aliases['en'] = array(
	'MyExtension' => array( 'MyExtension' )
);

/** Finnish (Suomi) */
$aliases['fi'] = array(
	'MyExtension' => array( 'Lisäosani' )
);

Then load it in the extension's setup file like this: $wgExtensionMessagesFiles['MyExtensionAlias'] = dirname( __FILE__ ) . '/MyExtension.alias.php';

When your special page code uses either SpecialPage::getTitleFor( 'MyExtension' ) or $this->getTitle() (in the class that provides Special:MyExtension), the localised alias will be used, if it's available.

Message parameters

Some messages take parameters. They are represented by $1, $2, $3, … in the (static) message texts, and replaced at run time. Typical parameter values are numbers (the "3" in "Delete 3 versions?"), or user names (the "Bob" in "Page last edited by Bob"), page names, links and so on, or sometimes other messages. They can be of arbitrary complexity.

Switches in messages…

See also Manual:Messages API#Notes about gender, grammar, plural.

Parameters values at times influence the exact wording, or grammatical variations in messages. We don't resort to ugly constructs like "$1 (sub)page(s) of his/her userpage", because these are poor for users and we can do better. Instead, we make switches that are parsed according to values that will be known at run time. The static message text then supplies each of the possible choices in a list, preceded by the name of the switch, and a reference to the value that makes a difference. This resembles the way parser functions are called in MediaWiki. Several types of switches are available. These only work if you do full parsing, or {{-transformation, for the messages.

…on numbers via PLURAL

See also Manual:Messages API#Notes about gender, grammar, plural.

MediaWiki supports plurals, which makes for a nicer-looking product. For example:

'undelete_short' => 'Undelete {{PLURAL:$1|one edit|$1 edits}}',

If there is an explicit plural form to be given for a specific number, it is possible with the following syntax

'Box has {{PLURAL:$1|one egg|$1 eggs|12=a dozen eggs}}.'
Be aware of PLURAL use on all numbers
See also Plural.

When a number has to be inserted into a message text, be aware that some languages will have to use PLURAL on it even if always larger than 1. The reason is that PLURAL in languages other than English can make very different and complex distinctions, comparable to English 1st, 2nd, 3rd, 4th, … 11th, 12th, 13th, … 21st, 22nd, 23rd, … etc.

Do not try to supply three different messages for cases like "no items counted", "one item counted", "more items counted". Rather, let one message take them all, and leave it to translators and PLURAL to properly treat any possible differences of presentation for them in their respective languages.

Always include the number as a parameter if possible. Always add {{PLURAL:}} syntax to the source messages if possible, even if it makes no sense in English. The syntax guides translators.

Fractional numbers are supported, but the plural rules may not be complete.

Pass the number of list items as parameters to messages talking about lists

Don't assume that there's only singular and plural. Many languages have more than two forms, which depend on the actual number used and they have to use grammar varying with the number of list items when expressing what is listed in a list visible to readers. Thus, whenever your code computes a list, include count( $list ) as parameter to headlines, lead-ins, footers and other messages about the list, even if the count is not used in English. There is a neutral way to talk about invisible lists, so you can have links to lists on extra pages without having to count items in advance.

…on user names via GENDER

See also Manual:Messages API#Notes about gender, grammar, plural.
'foobar-edit-review' => 'Please review {{GENDER:$1|his|her|their}} edits.'

If you refer to a user in a message, pass the user name as parameter to the message and add a mention in the message documentation that gender is supported. If it is likely that GENDER will be used in translations for languages with gender inflections, add it explicitly in the English language source message.

If you directly address the currently logged-in user, leave the user name as parameter empty:

'foobar-logged-in-user' => 'You said {{GENDER:|you were male|you were female|nothing about your gender}}.'
Users have grammatical genders
See also Gender.

When a message talks about a user, or relates to a user, or addresses a user directly, the user name should be passed to the message as a parameter. Thus languages having to, or wanting to, use proper gender dependent grammar, can do so. This should be done even when the user name is not intended to appear in the message, such as in "inform the user on his/her talk page", which is better made "inform the user on {{GENDER:$1|his|her|their}} talk page" in English as well.

This does not mean that you are encouraged to "sexualise" messages' language: please use gender-neutral language whenever this can be done with clarity and precision.

…on use context inside sentences via GRAMMAR

See also Manual:Messages API#Notes about gender, grammar, plural.

Grammatical transformations for agglutinative languages is also available. For example for Finnish, where it was an absolute necessity to make language files site-independent, i.e. to remove the Wikipedia references. In Finnish, "about Wikipedia" becomes "Tietoja Wikipediasta" and "you can upload it to Wikipedia" becomes "Voit tallentaa tiedoston Wikipediaan". Suffixes are added depending on how the word is used, plus minor modifications to the base. There is a long list of exceptions, but since only a few words needed to be translated, such as the site name, we didn't need to include it.

MediaWiki has grammatical transformation functions for over 20 languages. Some of these are just dictionaries for Wikimedia site names, but others have simple algorithms which will fail for all but the most common cases.

Even before MediaWiki had arbitrary grammatical transformation, it had a nominative/genitive distinction for month names. This distinction is necessary for some languages if you wish to substitute month names into sentences.

Filtering special characters in parameters and messages

The other (much simpler) issue with parameter substitution is HTML escaping. Despite being much simpler, MediaWiki does a pretty poor job of it.

Message documentation

There is a pseudo-language code qqq for message documentation. It is one of the ISO 639 codes reserved for private use. There, we do not keep translations of each message, but collect English sentences about each message: telling us where it is used, giving hints about how to translate it, and enumerating and describing its parameters, link to related messages, and so on. In translatewiki.net, these hints are shown to translators when they edit messages.

Programmers must document each and every message. Message documentation is an essential resource – not just for translators, but for all the maintainers of the module. Whenever a message is added to the software, a corresponding qqq entry must be added as well; revisions which don't do so are marked "V-1" until the documentation is added.

Documentation in qqq files should be edited directly only when adding new messages or when changing an existing English message in a way that requires a documentation change, for example adding or removing parameters. In other cases, documentation should usually be edited in translatewiki. Each documentation string is accessible at https://translatewiki.net/wiki/MediaWiki:message-key/qqq, as if it were a translation. These edits will be exported to the source repositories along with the translations.

Useful information that should be in the documentation includes:

  1. Message handling (parsing, escaping, plain text).
  2. Type of parameters with example values.
  3. Where the message is used (pages, locations in the user interface).
  4. How the message is used where it is used (a page title, button text, etc.).
  5. What other messages are used together with this message, or which other messages this message refers to.
  6. Anything else that could be understood when the message is seen on the context, but not when the message is displayed alone (which is the case when it is being translated).
  7. If applicable, notes about grammar. For example, "open" in English can be both a verb and an adjective. In many other languages the words are different and it's impossible to guess how to translate them without documentation.
  8. Adjectives that describe things, such as "disabled", "open" or "blocked", must always say what are they describing. In many languages adjectives must have the gender of the noun that they describe. It may also happen that different kinds of things need different adjectives.
  9. If the message has special properties, for example, if it is a page name, or if it should not be a direct translation, but adapted to the culture or the project.
  10. Whether the message appears near other message, for example in a list or a menu. The wording or the grammatical features of the words should probably be similar to the messages nearby. Also, items in a list may have to be properly related to the heading of the list.
  11. Parts of the message that must not be translated, such as generic namespace names, URLs or tags.
  12. Explanations of potentially unclear words, for example abbreviations, like "CTA", or specific jargon, like "template", "suppress" or "stub". (Note that it's best to avoid such words in the first place!)
  13. Screenshots are very helpful. Don't crop – an image of the full screen in which the message appears gives complete context and can be reused in several messages.

A few other hints:

  • Remember that very, very often translators translate the messages without actually using the software.
  • Most usually, translators do not have any context information, neither of your module, nor of other messages in it.
  • A rephrased message alone is useless in most circumstances.
  • Don't use designers' jargon like "nav" or "comps".
  • Consider writing a glossary of the technical terms that are used in your module. If you do it, link to it from the messages.

You can link to other messages by using {{msg-mw|message key}}. Please do this if parts of the messages come from other messages (if this cannot be avoided), or if some messages are shown together or in same context.

translatewiki.net provides some default templates for documentation:

  • {{doc-action|[...]}} for action- messages
  • {{doc-right|[...]}} for right- messages
  • {{doc-group|[...]|[...]}} for messages around user groups (group, member, page, js and css)
  • {{doc-accesskey|[...]}} for accesskey- messages

Have a look at the template pages for more information.

Internationalisation hints

Besides documentation, translators ask to consider some hints so as to make their work easier and more efficient and to allow an actual and good localisation for all languages. Even if only adding or editing messages in English, one should be aware of the needs of all languages. Each message is translated into more than 300 languages and this should be done in the best possible way. Correct implementation of these hints will very often help you write better messages in English, too.

These are the main places where you can find the assistance of experienced and knowledgeable people regarding i18n:

Please do ask there!

Use #Message parameters and switches properly

That's a prerequisite of a correct wording for your messages.

Avoid message re-use

The translators discourage message re-use. This may seem counter-intuitive, because copying and duplicating code is usually a bad practice, but in system messages it is often needed. Although two concepts can be expressed with the same word in English, this doesn't necessarily mean they can be expressed with the same word in every language. "OK" is a good example: in English this is used for a generic button label, but in some languages they prefer to use a button label related to the operation which will be performed by the button. Another example is practically any adjective: a word like "multiple" changes according to gender in many languages, so you cannot reuse it to describe several different things, and you must create several separate messages.

If you are adding multiple identical messages, please add message documentation to describe the differences in their contexts. Don't worry about the extra work for translators. Translation memory helps a lot in these while keeping the flexibility to have different translations if needed.

Avoid fragmented or 'patchwork' messages

Languages have varying word orders, and complex grammatical and syntactic rules. It's very hard to translate "lego" messages, that is messages formed by multiple pieces of text, possibly with some indirection (also called "string concatenation").

It is better to make every message a complete sentence, each with a full stop at the end. Several sentences can usually be combined much more easily be into a text block, if needed. When you want to combine several strings in one message, pass them in as parameters, as translators can order them correctly for their language when translating.

Messages quoting each other

An exception from the rule may be messages referring to one another: 'Enter the original author's name in the field labelled "{{int:name}}" and click "{{int:proceed}}" when done'. This makes the message consistent when a software developer or wiki operator alters the messages "name" or "proceed" later. Without the int-hack, developers and operators would have to be aware of all related messages needing adjustment, when they alter one.

Separate times from dates in sentences

Some languages have to insert something between a date and a time which grammatically depends on other words in a sentence. Thus, they will not be able to use date/time combined. Others may find the combination convenient, thus it is usually the best choice to supply three parameter values (date/time, date, time) in such cases, and in each translation leave either the first one or last two unused as needed.

Avoid {{SITENAME}} in messages

{{SITENAME}} has several disadvantages. It can be anything (acronym, word, short phrase, etc.) and, depending on language, may need the use of {{GRAMMAR}} on each occurrence. No matter what, each message having {{SITENAME}} will need review in most wiki languages for each new wiki on which your code is installed. In the majority of cases, when there is not a general GRAMMAR configuration for a language, wiki operators will have to add or amend PHP code so as to get {{GRAMMAR}} for {{SITENAME}} working. This requires both more skills, and more understanding, than otherwise. It is more convenient to have generic references like "this wiki". This does not keep installations from locally altering these messages to use {{SITENAME}}, but at least they don't have to, and they can postpone message adaption until the wiki is already running and used.

Avoid references to visual layout and positions

What is rendered where depends on skins. Most often screen layouts of languages written from left-to-right are mirrored compared to those used for languages written from right-to-left, but not always, and for some languages and wikis, not entirely. Handheld devices, narrow windows, and so on may show blocks underneath each other, that would appear side-by-side on larger displays. Since site- and user-written JavaScript scripts and gadgets can, and do, hide parts, or move things around in unpredictable ways, there is no reliable way of knowing the actual layout.

It is wrong to tie layout information to content languages, since the user interface language may not be the page's content language, and layout may be a mixture of the two depending on circumstances. Non-visual user agents like acoustic screen readers and other auxiliary devices do not even have a concept of visual layout. Thus, you should not refer to visual layout positions in the majority of cases, though semantic layout terms may still be used ("previous steps in the form", etc.).

MediaWiki does not support showing different messages or message fragments based on the current directionality of the interface (see T30997).

The upcoming browser and MediaWiki support for East and North Asian top-down writing[2] will make screen layouts even more unpredictable, with at least eight possible layouts (left/right starting position, top/bottom starting position, and which happens first).

Avoid references to screen colours

The colour in which something is rendered depends on many factors, including skins, site- and user-written JavaScript scripts and gadgets, and local user agent over-rides for reasons of accessibility or technological limitations. Non-visual user agents like acoustic screen readers and other auxiliary devices do not even have a concept of colour. Thus, you should not refer to screen colours. (You should also not rely on colour alone as a mechanism for informing the user of state, for the same reason.)

Have message elements before and after each input field

This is a suggested guideline, has not become standard in MediaWiki development

While English allows efficient use of prompting in the form item–colon–space–input-field, many other languages don't. Even in English, you often want to use "Distance: ___ metres" rather than "Distance (in metres): ___". Leaving <textarea> elements aside, you should think of each and every input field following the "Distance: ___ metres" pattern. So:

  • give it two messages, even if the 2nd one is empty in English and some other languages, or
  • allow the placement of inputs via $i parameters.

Avoid untranslated HTML markup in messages

HTML markup not requiring translation, such as enclosing <div>s, rulers above or below, and similar, should usually not be part of messages. They unnecessarily burden translators, increase message file size, and pose the risk to accidentally being altered or skipped in the translation process. In general, avoid raw HTML in messages if you can.

Messages are often longer than you think!

Skimming foreign language message files, you find messages almost never shorter than Chinese ones, rarely shorter than English ones, and most usually much longer than English ones.

Especially in forms, in front of input fields, English messages tend to be terse, and short. That is often not kept in translations. Especially genuinely non-technical third world languages, vernacular, mediæval, or ancient languages require multiple words or even complete sentences to explain foreign, or technical, prompts. For example, the brief English message "TSV file:" may have to be translated in a language as literally:

Please type a name here which denotes a collection of computer data that is comprised of a sequentially organised series of typewritten lines which themselves are organised as a series of informational fields each, where said fields of information are fenced, and the fences between them are single signs of the kind that slips a typewriter carriage forward to the next predefined position each. Here we go: _____ (thank you)

This is, admittedly, an extreme example, but you get the trait. Imagine this sentence in a column in a form where each word occupies a line of its own, and the input field is vertically centered in the next column. :-(

Avoid using very close, similar, or identical words to denote different things, or concepts

For example, pages may have older revisions (of a specific date, time, and edit), comprising past versions of said page. The words revision, and version can be used interchangeably. A problem arises, when versioned pages are revised, and the revision, i.e. the process of revising them, is being mentioned, too. This may not pose a serious problem when the two synonyms of "revision" have different translations. Do not rely on that, however. It is better to avoid the use of "revision" aka "version" altogether, then, so as to avoid it being mis-interpreted.

Basic words may have unforeseen connotations, or not exist at all

There are some words that are hard to translate because of their very specific use in MediaWiki. Some may not be translated at all. For example, there is no word "user" relating to "someone who uses something" in several languages. Similarly, in Kölsch the English words "namespace" and "apartment" translate the same word. Sticking to Kölsch, they say "corroborator and participant" in one word since any reference to "use" would too strongly imply "abuse" as well. The term "wiki farm" is translated as "stable full of wikis", since a single-crop farm would be a contradiction in terms in the language, and not understood, etc..

Expect untranslated words

This is a suggested guideline, has not yet become standard in MediaWiki development

It is not uncommon that proper names, tag names, etc. and computerese in English are not translated, and instead taken as loan-words, or foreign words. In the latter case, some particularly-fastidious translators may mark such words as belonging to another language with HTML markup, such as <span lang="en" xml:lang="en"></span>.

You may want to consider ensuring that your message output handler passes such markup along unmolested, despite the obvious security risks.

Permit explanatory inline markup

This is a suggested guideline, has not yet become standard in MediaWiki development

Sometimes there are abbreviations, technical terms, or generally ambiguous words in target languages that may not be immediately understood by newcomers, but are obvious to experienced computer users. So as to avoid screen clutter of lengthy explanations without leaving newcomers stranded, translators may choose to add explanations as <abbr> annotations, shown by browsers when you move the mouse over them.

For example, the MediaWiki core message exif-orientation-8 about image rotation, which in English is simply "Rotated 90° CW", in Moroccan Arabic is translated as:

mḍwwer 90° <abbr title="Ĝks (ṫ-ṫijah) Ĝaqarib s-Saĝa">ĜĜS</abbr>

giving:

mḍwwer 90° ĜĜS

explaining the abbreviation for "counter clockwise" when needed.

You may want to consider ensuring that your message output handler passes such markup along unmolested, even if the original message does not use them.

Use <code>, <var>, and <kbd> tags where needed

When talking about technical parameters, values, or keyboard inputs, mark them appropriately as such using the HTML tags <code>, <var>, or <kbd>. Thus they are typographically set off form the normal text. That clarifies their sense to readers, avoiding confusion, errors and mis-representations. Ensure that your message handler allows such markup.

Symbols, colons, brackets, etc. are parts of messages

Many symbols are localisable, too. Some scripts have other kinds of brackets than the Latin script has. A colon may not be appropriate after a label or input prompt in some languages. Having those symbols included in messages helps to better and less Anglo-centric translations, and by the way reduces code clutter.

If you need to wrap some text in localized parentheses, brackets, or quotation marks, you can use the parentheses ($1) or brackets [$1] or quotation-marks "$1" messages like so:

wfMessage( 'parentheses' )->rawParams( /* text to go inside parentheses */ )->escaped()
wfMessage( 'brackets' )->rawParams( /* text to go inside brackets */ )->escaped()
wfMessage( 'quotation-marks' )->rawParams( /* text to go inside quotation marks */ )->escaped()

Do not expect symbols and punctuation to survive translation

Languages written from right to left (as opposed to English) usually swap arrow symbols being presented with "next" and "previous" links, and their placement relative to a message text may, or may not, be inverted as well. Ellipsis may be translated to "etc.", or to words. Question marks, exclamation marks, colons will be placed other than at the end of a sentence, not at all, or twice. As a consequence, always include all of those in the text of your messages, and never try to insert them programmatically.

Use full stops

Do terminate normal sentences with full stops. This is often the only indicator for a translator to know that they are not headlines or list items, which may need to be translated differently.

Link anchors

Wikitext of links

Link anchors can be put into messages in several technical ways:

  1. via wikitext: … [[a wiki page|anchor]] …,
  2. via wikitext: … [some-url anchor] …, or
  3. the anchor text is a message in the MediaWiki namespace. Avoid it!

The latter is often hard or impossible to handle for translators, avoid patchwork messages here, too. Make sure that "some-url" does not contain spaces.

Use meaningful link anchors

Take care with your wording. Link anchors play an important role in search engine assessment of pages – both the words linked, and the target anchor. Make sure that the anchor describes the target page well. Always avoid commonplace and generic words. For example, "Click here" is an absolute no-go,[3] since target pages are almost never about "click here". Do not put that in sentences around links either, because "here" was not the place to click. Instead, Use precise action words telling what a user will get to when following the link, such as "You can upload a file if you wish."

See also Help users predict where they are going and mystery meat navigation.

Avoid jargon and slang

Avoid developer and power user jargon in messages. Try to use a simple language whenever possible.

Avoid saying "success", "successfully", "fail", "error occurred while", etc., when you want to notify the user that something happened or didn't happen. This comes from developers' seeing everything as true or false, but users usually just want to know what actually happened or didn't, and they should do about it (if at all). So:

  • "The file was successfully renamed" -> "The file was renamed"
  • "File renaming failed" -> "There is a file with this name already. Please choose a different name."

One sentence per line

This is a suggested guideline, has not yet become standard in MediaWiki development

Try to have one sentence or similar block in one line. This helps to compare the messages in different languages, and may be used as an hint for segmentation and alignment in translation memories.

Be aware of whitespace and line breaks

MediaWiki's localised messages usually get edited within the wiki, either by wiki operations on live wikis, or by the translators on translatewiki.net. You should be aware of how whitespace, especially at the beginning or end of your message, will affect editors:

  • Newlines at the beginning or end of a message are fragile, and will be frequently removed by accident. Start and end your message with active text; if you need a newline or paragraph break around it, your surrounding code should deal with adding it to the returned text.
  • Spaces at the beginning or end of a message are also likely to be removed during editing, and should be avoided. If a space is required for output, usually your code should append it or else you should use a non-breaking space such as &nbsp; (in which case check your escaping settings!)

Use standard capitalisation

Capitalisation gives hints to translators as to what they are translating, such as single words, list or menu items, phrases, or full sentences. Correct (standard) capitalisation may also play a role in search engines' assessment of your pages. MediaWiki uses sentence case (The quick brown fox jumps over the lazy dog) in interface messages.

Always remember that many writing systems don't have capital letters at all, and some of those that do have them, use them differently from English. Therefore, don't use ALL-CAPS for emphasis. Use CSS, or HTML <em> or <strong> per below:

Emphasis

In normal text, emphasis like boldface or italics and similar should be part of message texts. Local conventions on emphasis often vary, especially some Asian scripts have their own. Translators must be able to adjust emphasis to their target languages and areas. Try to use "<em>" and "<strong>" in your user interface to allow mark-up on a per language or per script basis.

In modern screen layouts of English and European styles, emphasis becomes less used. Do convey it in your message documentation still, as it may give valuable hints as to how to translate. Emphasis can and should be used in other cultural contexts as appropriate, provided that translators know about it.

Overview of the localisation system

Update of localisation

As mentioned above, translation happens on translatewiki.net and other systems are discouraged. Here's a high level overview of the localisation update workflow:

  • Developers add or change system messages.
  • Users translate the new or changed system messages on translatewiki.net.
  • Automated tools export these messages, build new versions of the message files, incorporating the added or updated messages, for both core and extensions, and commit them to git.
  • The wikis then can pull in the updated system messages from the git repository.

Wikimedia projects and any other wikis can benefit immediately and automatically from localisation work thanks to the LocalisationUpdate extension.[4] This compares the latest English messages to the English messages in production. If they are not the same, the production translations are updated and made available to users.

Once translations are in the version control system, the Wikimedia Foundation has a daily job that updates a checkout or clone of the extension repository. This was first established in September 2009.[5]

Because changes on translatewiki.net are pushed to the code daily as well, this means that each change to a message can potentially be applied to all existing MediaWiki installations in a couple days without any manual intervention or traumatic code update.

As you can see this is a multi-step process. Over time, we have found out that many things can go wrong. If you think the process is broken, please make sure to report it on our Support page, or create a new bug in Phabricator. Always be sure to describe a precise observation.

Handling support requests

Main page: translatewiki:Translating:Localisation for developers.

Translators may have questions about some of the messages you create. Translatewiki.net provides a support request system that allows translators the ability to ask you, the project owner, questions regarding messages so that they can be better translated. This short tutorial guides you through the workflow of handling translatewiki.net support requests.

Message sources

Code looks up system messages from these sources:

  • The MediaWiki namespace. This allows wikis to adopt, or override, all of their messages, when standard messages do not fit or are not desired (see #Old local translation system).
    • MediaWiki:Message-key is the default message,
    • MediaWiki:Message-key/language-code is the message to be used when a user has selected a language other than the wiki's default language.
  • From message files:
    • Core MediaWiki itself and most currently maintained extensions use a file per language, named zyx.json, where zyx is the language code for the language.
    • Some older extensions use a combined message file holding all messages in all languages, usually named MyExtensionName.i18n.php.
    • Many Wikimedia Foundation wikis access some messages from the WikimediaMessages extension, allowing them to standardise messages across WMF wikis without imposing them on every MediaWiki installation.
    • A few extensions use other techniques.

Caching

System messages are one of the more significant components of MediaWiki, primarily because it is used in every web request. The PHP message files are large, since they store thousands of message keys and values. Loading this file (and possibly multiple files, if the user's language is different from the content language) has a large memory and performance cost. An aggressive, layered caching system is used to reduce this performance impact.

MediaWiki has lots of caching mechanisms built in, which make the code somewhat more difficult to understand. Since 1.16 there is a new caching system, which caches messages either in .cdb files or in the database. Customised messages are cached in the filesystem and in memcached (or alternative), depending on the configuration.

The table below gives an overview of the settings involved:

Location of cache storage $wgLocalisationCacheConf
'store' => 'db'
 
'store' => 'detect'
(default)
'store' => 'files'
 
$wgCacheDirectory = false
(default)
l10n cache table l10n cache table error (undefined path)
= path l10n cache table local filesystem local filesystem

Function backtrace

To better visually depict the layers of caching, here is a function backtrace of what methods are called when retrieving a message. See the below sections for an explanation of each layer.

  • Message::fetchMessage()
  • MessageCache::get()
  • Language::getMessage()
  • LocalisationCache::getSubitem()
  • LCStore::get()

MessageCache

The MessageCache class is the top level of caching for messages. It is called from the Message class and returns the final raw contents of a message. This layer handles the following logic:

The last bullet is important. Language fallbacks allow MediaWiki to fall back on another language if the original does not have a message being asked for. As mentioned in the next section, most of the language fallback resolution occurs at a lower level. However, only the MessageCache layer checks the database for overridden messages. Thus integrating overridden messages from the database into the fallback chain is done here. If not using the database, this entire layer can be disabled.

LocalisationCache

See LocalisationCache

LCStore

The LCStore class is merely a back-end implementation used by the LocalisationCache class for actually caching and retrieving messages. Like the BagOStuff class, which is used for general caching in MediaWiki, there are a number of different cache types (configured using $wgLocalisationCacheConf):

  • "db" (default) - Caches messages in the database
  • "file" (default if $wgCacheDirectory is set) - Uses CDB to cache messages in a local file
  • "accel" - Uses APC or another opcode cache to store the data

The "file" option is used by the Wikimedia Foundation, and is recommended because it is faster than going to the database and more reliable than the APC cache, especially since APC is incompatible with PHP versions 5.5 or later.

Licence

Any edits made to the language must be licensed under the terms of the GNU General Public License to be included in the MediaWiki software. Other extensions may be under different licences.

Old local translation system

With MediaWiki 1.3.0, a new system was set up for localising MediaWiki. Instead of editing the language file and asking developers to apply the change, users could edit the interface strings directly from their wikis. This is the system in use as of August 2005. People can find the message they want to translate in Special:AllMessages and then edit the relevant string in the MediaWiki: namespace. Once edited, these changes are live. There was no more need to request an update, and wait for developers to check and update the file.

The system is great for Wikipedia projects; however a side effect is that the MediaWiki language files shipped with the software are no longer quite up-to-date, and it is harder for developers to keep the files on meta in sync with the real language files.

As the default language files do not provide enough translated material, we face two problems:

  1. New Wikimedia projects created in a language which has not been updated for a long time, need a total re-translation of the interface.
  2. Other users of MediaWiki (including Wikimedia projects in the same language) are left with untranslated interfaces. This is especially unfortunate for the smaller languages which don't have many translators.

This is not such a big issue anymore, because translatewiki.net is advertised prominently and used by almost all translations. Local translations still do happen sometimes but they're strongly discouraged. Local messages mostly have to be deleted, moving the relevant translations to translatewiki.net and leaving on the wiki only the site-specific customisation; there's a huge backlog especially in older projects, this tool helps with cleanup.

Keeping messages centralised and in sync

English messages are very rarely out of sync with the code. Experience has shown that it's convenient to have all the English messages in the same place. Revising the English text can be done without reference to the code, just like translation can. Programmers sometimes make very poor choices for the default text.

Appendix

What can be localised

So many things are localisable on MediaWiki that not all of them are directly available on translatewiki.net: see translatewiki:Translating:MediaWiki. If something requires a developer intervention on the code, you can request it on Phabricator, or ask at translatewiki:Support if you don't know what to do exactly.

Graph of language fallback
  • Fallback languages (that is, other more closely related language(s) to use when a translation is not available, instead of the default fallback, which is English)
  • Directionality (left to right or right to left, RTL)
  • Direction mark character depending on RTL
  • Arrow depending on RTL
  • Languages where italics cannot be used
  • Number formatting (comma-ify, i.e. adding or not digits separators; transform digits; transform separators)[6]
  • Truncate (multibyte)
  • Grammar conversions for inflected languages
  • Plural transformations
  • Formatting expiry times[clarification needed]
  • Segmenting for diffs (Chinese)
  • Convert to variants of language (between different orthographies, or scripts)
  • Language specific user preference options
  • Link trails and link prefix, e.g.: [[foo]]bar These are letters that can be glued after/before the closing/opening brackets of a wiki link, but appear rendered on the screen as if part of the link (that is, clickable and in the same colour). By default the link trail is "a-z"; you may want to add the accentuated or non-Latin letters used by your language to the list.
  • Language code (preferably used according to the latest RFC in standard BCP 47, currently RFC 5646, with its associated IANA database. Avoid deprecated, grandfathered and private-use codes: look at what they mean in standard ISO 639, and avoid codes assigned to collections/families of languages in ISO 639-5, and ISO 639 codes which were not imported in the IANA database for BCP 47)
  • Type of emphasising
  • The Cite extension has a special page file per language, cite_text-zyx for language code zyx.

Neat functionality:

  • I18N sprintfDate
  • Roman numeral formatting

Namespace name aliases

Namespace name aliases are additional names which can be used to address existing namespaces. They are rarely needed, but not having them when they are, usually creates havoc in existing wikis.

You need namespace name aliases:

  1. When a language has variants, and these variants spell some namespaces differently, and you want editors to be able to use the variant spellings. Variants are selectable in the user preferences. Users always see their selected variant, except in wikitext, but when editing or searching, an arbitrary variant can be used.
  2. When an existing wiki's language, fall back language(s), or localisation is changed, with it are changed some namespace names. So as not to break the links already present in the wiki, that are using the old namespace names, you need to add each of the altered previous namespace names to its namespace name aliases, when, or before, the change is made.

The generic English namespace names are always present as namespace name aliases in all localisations, so you need not, and should not, add those.

Aliases can't be translated on translatewiki.net, but can be requested there or on bugzilla: see translatewiki:Translating:MediaWiki#Namespace name aliases.

Regional settings

Some linguistic settings vary across geographies; MediaWiki doesn't have a concept of region, it only has languages and language variants.

These settings need to be set once as a language's default, then individual wikis can change them as they wish in their configuration.

Time and date formats

Time and dates are shown on special pages and alike. The default time and date format is used for signatures, so it should be the most used and most widely understood format for users of that language. Also anonymous users see the default format. Registered users can choose other formats in their preferences.

If you are familiar with PHP's time() format, you can try to construct formats yourself. MediaWiki uses a similar format string, with some extra features. If you don't understand the previous sentence, that's OK. You can provide a list of examples for developers.

Old edit window toolbar buttons

Not to be confused with the much more common WikiEditor's "advanced toolbar", which has similar features.

When a wiki page is being edited, and a user has allowed it in their Special:Preferences, a set of icons is displayed above the text area where one can edit. The toolbar buttons can be set [1] but there are no messages for it. What we need is a set of properly sized .png files. Plenty of samples can be found in commons:Category:ButtonToolbar, and there is an empty button image to start off from.

Note, this can only be done when your language is already enabled in MediaWiki, which usually means a good portion of its messages have been translated; otherwise you must just wait, and have it done later.

Missing

This section is missing about the changes in the i18n system related to extensions. The format was standardised and messages are automatically loaded.

See Message sources.

References

  1. https://gerrit.wikimedia.org/r/211677
  2. http://dev.w3.org/csswg/css3-writing-modes/
  3. http://www.w3.org/QA/Tips/noClickHere
  4. Which works through the localisation cache and for instance on Wikimedia projects updates it daily; see also the technical details about the specific implementation.
  5. LocalisationUpdate update; LocalisationUpdate is live.
  6. These are configured by language in the respective language/classes/LanguageXx.php or language/messages/MessagesXx.php files.

See also

Language: English  • 日本語