Why you need a localization tester tool

I’ve worked with four different translation agencies in different projects. In each case, we have occasionally received some faulty translations, where for example placeholders were missing or incorrectly formatted. This has occurred even when the translator has used a translation platform which warns about missing translation placeholders.

As with other quality aspects, you are ultimately responsible, and automation is key to quality.

Therefore in each project I’ve worked on I’ve built a tool that performs basic checks on the translation files. Checks include:

  • The file format is correct
  • All translation files contain exactly the same keys and no duplicate keys
  • Same placeholders and HTML tags are present in the translations as the master (make sure that {foo}} also causes an error)
  • Correct pluralization forms are included per language (if complex pluralization is used)
  • No master or translation texts contain “TODO” or “FIXME”

The last point allows placing temporary localization keys in the master or translation files without worrying that they’re accidentally left in.

Our CI is configured to run the localization tests only in the master branch, not development branches. This allows development to be done while texts and translations are not yet complete, but localization files need to be correct before deployment.

This works particularly well together with a continuous localization system.


Posted in Continuous integration, Localization, Uncategorized | Tagged , | Leave a comment

UX Fails: Mac OS app switcher (and how to fix it)

A year ago, I did the switch from Linux to Mac OS at work. While the initial couple of weeks were painful, I’ve gradually grown to love the Mac environment. It’s a good balance of a Unix-like development environment with ease of use and commercial supported software.

But no matter how praised Apple’s design is, they have failed in some areas. The Mac OS app switcher is one of them. A lot can be said on how an app switcher should work, but there are a couple of clear cock-ups in the Mac OS flavor.

Continue reading

Posted in Usability, UX, UX Failure | Tagged , , | Leave a comment

Swearing in Unicode

Swearing symbolsSometimes you just gotta swear, but don’t have the words for it. This is where grawlixes, or symbol swearing, comes in.

For text-only locations such as chats, Unicode offers a wide variety of characters to use. Unfortunately, the standard keyboard is not very well designed for versatile grawlix swearing. Furthermore, many browsers have started to render even previously “normal” characters as colored emoji characters. Mixing regular black characters with colored emoji may not produce the best results.

I tried to find a good set of grawlix characters from the Unicode character set and did a quick test which ones are rendered as emoji by common browsers and which aren’t. The results are below (YMMV):

Regular: #@?!§%&❢☆★✩
Depends on browser: ⚡‼⁉☁✔✖❣
Emoji: 🌟🔪🔫✊🌀⛈🌧🌩🔨💣❗👊🗡☠💀

Based on this I’ve provided a couple of ready options to copy-paste for your swearing needs:





Do you have any more characters to suggest?

Posted in Unicode | Tagged , | Leave a comment

Entropy of a MongoDB ObjectId

A MongoDB ObjectId is a globally unique ID that can be generated on any machine with high probability of it being distinct from any other ObjectId generated elsewhere. The contents is not random, but consists of a few pieces of data. While you should never rely on knowing an ObjectId value for security, there are cases where it’s important to understand how difficult it is to guess a generated ObjectId.

Let’s consider a simpler case: IPv4 addresses. For privacy reasons, you might not want to store real IP addresses, but you still want to count the number of distinct IP’s used. The trivial solution is to take a SHA1 hash of the IP address and store those. The problem is that this is not really a one-way hash, because the source range is so limited. A modern GPU will test through all possible IPv4 addresses in a fraction of a second.

In this case the source range of IPv4 contains only 32 bits of entropy. So how many bits of entropy does a MongoDB ObjectId contain?

Continue reading

Posted in MongoDB, Security | Tagged , , | Leave a comment

ISO certification for dummies

Wellmo recently received its ISO 27001 certificate, the primary standard for an information security management system (ISMS).  I was closely involved in defining the processes and policies to obtain the certificate, and will share some of my experiences in this post. Continue reading

Posted in Certification, ISO 27001, Security | Tagged , | 1 Comment

Complex plurals

Pluralization is a key topic in localization, but its complexity is very often overlooked.  Every localization library supports singular and plural forms, but many languages have more than two plural forms — some even have six different forms!

We encountered this when localizing Wellmo to Polish.  Polish has four different plural forms (below are three, the fourth is used in decimals):

0 steps = 0 kroków
1 step = 1 krok
2 steps = 2 kroki
3 steps = 3 kroki
4 steps = 4 kroki
5 steps = 5 kroków
6 steps = 6 kroków

Getting pluralization correct for all languages poses a challenge, and more often than not, it is done wrong.

Continue reading

Posted in Coding, Localization | Tagged , , | 1 Comment

Continuous localization

In a lot of development (especially outside the US), localization is a must.  You can have people 100 km away that don’t speak the same language.  Unfortunately localization is often a significant hindrance in agile development and continuous deployment.

In this post, I’ll describe how at Wellmo we set up a system where translation is a short, relatively easy step a bit before deployment.

Continue reading

Posted in Coding, Localization | Tagged , , , , | 2 Comments