Complex plurals

Pluralization is a key topic in localization, but its complexity is very often overlooked.  Every localization library supports singular and plural forms, but many languages have more than two plural forms — some even have six different forms!

We encountered this when localizing Wellmo to Polish.  Polish has four different plural forms (below are three, the fourth is used in decimals):

0 steps = 0 kroków
1 step = 1 krok
2 steps = 2 kroki
3 steps = 3 kroki
4 steps = 4 kroki
5 steps = 5 kroków
6 steps = 6 kroków

Getting pluralization correct for all languages poses a challenge, and more often than not, it is done wrong.

Where the trouble begins

The typical way to handle complex plurals is to use Unicode CLDR Plural Rules.  Unicode CLDR has defined six different plural categories which encompass all plural forms in all languages.  The plural categories have been named: zero, one, two, few, many, other.

While these names are indicative of their meaning, they are also wildly misleading.  Even documentation of many localization libraries use the categories incorrectly.  For example, the i18n-js library documentation contains the following example:

en:
  inbox:
    counting:
      one: You have 1 new message
      other: You have {{count}} new messages
      zero: You have no messages

The core problem here is that only the “other” version contains the placeholder {{count}}.  The other two forms assume that count is 1 for “one” and 0 for “zero”.  While this holds true for English (and in this case even the CLDR rules have been extended to include “zero” for English), you’re likely to run into trouble with translations.

The point to remember:  The category names do not indicate an amount!  The category name “one” should be understood as “the pluralization category similar to that used for the number 1”.  For example French uses the category “one” for the count 0, and Russian uses it also for 21, 31, 41, etc.

If you sent the above file to a Russian translator, unless they are very experienced in this area, you’ll likely get back:

ru:
  inbox:
    counting:
      one: У вас есть 1 новое сообщение
      other: У вас есть {{count}} новых сообщений
      zero: У вас нет сообщений

(Above texts are from Google translate)

When your user then has 21 new messages, the message will read “You have 1 new message”.

While you can get this to work with experienced / in-house translators, I find this poor practice for the following reasons:

  • It’s very error prone:  a translator can easily translate the master text directly, instead of converting the specialized form to a generic form
  • It creates confusion which form will be used for which language.  In French the form “one” is used for amounts 0 and 1, but what if a separate “zero” key exists?
  • Standard translation tools won’t know how to deal with your specially extended rules
  • Translation checker tools cannot verify whether correct placeholders are used in the translations

Doing it the right way

The rule is:  Whenever you use a CLDR plural form, include the number placeholder.  Every — single — time.

In order to support natural-sounding versions of special cases (such as “You have no messages”), at Wellmo we decided to extend the CLDR rules with two special categories: “none” and “single”.  These match the counts 0 and 1 for all languages, and fall back to the CLDR rules if they are missing.  The difference to using “zero” and “one” is that they don’t clash with CLDR categories, but are instead handled as completely separate keys.

Using these extra rules, the above example could be written as:

en:
  inbox:
    counting:
      one: You have {{count}} new message
      other: You have {{count}} new messages
      none: You have no messages
      single: You have one new message

The important distinction is that both “one” and “other” contain the {{count}} placeholder.  Thus, it makes no assumptions on the language rules.

When translated to Russian the corresponding entry would contain entries for “one”, “few”, “many”, “other”, “none” and “single”.  Automated tools can check that the placeholders for all of these correspond to the master language.

One thing to note is that in the example above, the English key en.inbox.counting.one will never be used, as it always uses en.index.counting.single instead.  I find this a minor grievance compared to the consistency and other benefits it brings about.

Having complex plurals translated

Due to the different number of plural forms in different languages, there will inevitably be a different number of strings in different languages.  Some translation platforms, such as OneSky, have specific support for CLDR plurals.  When it detects a CLDR plural form in the master language, it asks for the correct amount of translations for the destination language.

Unfortunately, OneSky only supports detecting plural forms in specific file formats.  We had to update our continuous localization scripts to convert back and forth between Java property files (that we use) and Android XML localization files (which has specific support for plurals).

Once set up, this system works like a charm:

  • Plural forms work correctly in any language we want to localize to
  • Specific forms are available for “no messages” and “one message” where desired
  • Translators are automatically prompted for the correct amount of plural forms
  • Our translation checker tool ensures that all translations and placeholders are correct
Advertisements
This entry was posted in Coding, Localization and tagged , , . Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s