Why you need a localization tester tool

I’ve worked with four different translation agencies in different projects. In each case, we have occasionally received some faulty translations, where for example placeholders were missing or incorrectly formatted. This has occurred even when the translator has used a translation platform which warns about missing translation placeholders.

As with other quality aspects, you are ultimately responsible, and automation is key to quality.

Therefore in each project I’ve worked on I’ve built a tool that performs basic checks on the translation files. Checks include:

  • The file format is correct
  • All translation files contain exactly the same keys and no duplicate keys
  • Same placeholders and HTML tags are present in the translations as the master (make sure that {foo}} also causes an error)
  • Correct pluralization forms are included per language (if complex pluralization is used)
  • No master or translation texts contain “TODO” or “FIXME”

The last point allows placing temporary localization keys in the master or translation files without worrying that they’re accidentally left in.

Our CI is configured to run the localization tests only in the master branch, not development branches. This allows development to be done while texts and translations are not yet complete, but localization files need to be correct before deployment.

This works particularly well together with a continuous localization system.


Posted in Continuous integration, Localization, Uncategorized | Tagged , | Leave a comment

UX Fails: Mac OS app switcher (and how to fix it)

A year ago, I did the switch from Linux to Mac OS at work. While the initial couple of weeks were painful, I’ve gradually grown to love the Mac environment. It’s a good balance of a Unix-like development environment with ease of use and commercial supported software.

But no matter how praised Apple’s design is, they have failed in some areas. The Mac OS app switcher is one of them. A lot can be said on how an app switcher should work, but there are a couple of clear cock-ups in the Mac OS flavor.

Continue reading

Posted in Usability, UX, UX Failure | Tagged , , | Leave a comment

Swearing in Unicode

Swearing symbolsSometimes you just gotta swear, but don’t have the words for it. This is where grawlixes, or symbol swearing, comes in.

For text-only locations such as chats, Unicode offers a wide variety of characters to use. Unfortunately, the standard keyboard is not very well designed for versatile grawlix swearing. Furthermore, many browsers have started to render even previously “normal” characters as colored emoji characters. Mixing regular black characters with colored emoji may not produce the best results.

I tried to find a good set of grawlix characters from the Unicode character set and did a quick test which ones are rendered as emoji by common browsers and which aren’t. The results are below (YMMV):

Regular: #@?!§%&❢☆★✩
Depends on browser: ⚡‼⁉☁✔✖❣
Emoji: 🌟🔪🔫✊🌀⛈🌧🌩🔨💣❗👊🗡☠💀

Based on this I’ve provided a couple of ready options to copy-paste for your swearing needs:





Do you have any more characters to suggest?

Posted in Unicode | Tagged , | 1 Comment

Entropy of a MongoDB ObjectId

A MongoDB ObjectId is a globally unique ID that can be generated on any machine with high probability of it being distinct from any other ObjectId generated elsewhere. The contents is not random, but consists of a few pieces of data. While you should never rely on knowing an ObjectId value for security, there are cases where it’s important to understand how difficult it is to guess a generated ObjectId.

Let’s consider a simpler case: IPv4 addresses. For privacy reasons, you might not want to store real IP addresses, but you still want to count the number of distinct IP’s used. The trivial solution is to take a SHA1 hash of the IP address and store those. The problem is that this is not really a one-way hash, because the source range is so limited. A modern GPU will test through all possible IPv4 addresses in a fraction of a second.

In this case the source range of IPv4 contains only 32 bits of entropy. So how many bits of entropy does a MongoDB ObjectId contain?

Continue reading

Posted in MongoDB, Security | Tagged , , | Leave a comment

ISO certification for dummies

Wellmo recently received its ISO 27001 certificate, the primary standard for an information security management system (ISMS).  I was closely involved in defining the processes and policies to obtain the certificate, and will share some of my experiences in this post. Continue reading

Posted in Certification, ISO 27001, Security | Tagged , | 1 Comment

Complex plurals

Pluralization is a key topic in localization, but its complexity is very often overlooked.  Every localization library supports singular and plural forms, but many languages have more than two plural forms — some even have six different forms!

We encountered this when localizing Wellmo to Polish.  Polish has four different plural forms (below are three, the fourth is used in decimals):

0 steps = 0 kroków
1 step = 1 krok
2 steps = 2 kroki
3 steps = 3 kroki
4 steps = 4 kroki
5 steps = 5 kroków
6 steps = 6 kroków

Getting pluralization correct for all languages poses a challenge, and more often than not, it is done wrong.

Continue reading

Posted in Coding, Localization | Tagged , , | 1 Comment

Continuous localization

In a lot of development (especially outside the US), localization is a must.  You can have people 100 km away that don’t speak the same language.  Unfortunately localization is often a significant hindrance in agile development and continuous deployment.

In this post, I’ll describe how at Wellmo we set up a system where translation is a short, relatively easy step a bit before deployment.

Continue reading

Posted in Coding, Localization | Tagged , , , , | 2 Comments

Benchmarking AWS t2 instances

A couple of years ago I did some benchmarking on the AWS t1.micro instances.  The t1.micro was AWS’s first performance burstable instance type, which are appropriate for non-continuous CPU usage.

The CPU allocation of the t1.micro is not specified anywhere.  In practice the throttling is quite harsh.  If you have a CPU hungry process that takes over 10 seconds to run, it will be throttled.

The t2 instance family CPU allocation is clearly defined using CPU credits.  You acquire CPU credits when idle and spend them when busy.  The credits accumulate up to 24 hours.  This allows a daily-run CPU hungry process 72 minutes of full CPU power even on the measly t2.nano.

I wrote a simple program to test this functionality.  It performs floating point computations and every 10 seconds reports how many rounds it got through.

Below is a graph of the results for t2.nano, t2.micro and t2.small instances.  CPU usage is scaled to the maximum 10 second performance of each instance.


As a comparison, below is the corresponding results for a 30-minute run on a t1.micro with 1 second resolution.  The peaks are 10-15 seconds in length.


Before the test, the t2 instances were put under full load until the initial CPU credits had been spent, and then let be idle for 24 hours in order to accumulate full CPU credits.

The t2.micro was about 11% faster than the other two, as it happened to be launched on a 2.5 GHz Intel Xeon E5-2670 v2, while the other two were on a 2.4 GHz Intel Xeon E5-2676 v3.  The absolute performance will depend on which type of machine you happen to get.  With this performance test, the speed was approximately the same as a single core on my work laptop.

The t2.micro also experienced more random variability during the high load phase.  These were short variations, and all instances averaged 98-99% of the max CPU usage during the time.

The throttled performance was a very steady 5 / 10 / 20% of the maximum performance, with the exception of an additional speed boost of 5 minutes every 40 minutes.  No idea why that happens.

There’s a slight step down when CPU credits are exhausted before full throttling takes place.  It’s most prominent with the t2.nano, where the CPU dropped to 17% for 5 minutes before dropping to the base level of 5%.  The corresponding values were 5 minutes at 15% and 10 minutes at 25% for the t2.micro and t2.small, respectively.

Overall, the t2 instances provide much better predictability, control and flexibility on CPU performance than the t1.micro.  The t2 family is very well suited for general use, handling both continuous small bursts, and occasional longer bursts of activity gracefully.

Posted in Amazon AWS | Tagged , | 1 Comment

When to hard-code

Hard-coding is generally considered an anti-pattern and abhorred by experienced developers. Input and configuration data should be externalized from the code, or at the very least parametrized to a language constant.

While working at Wellmo, I’ve come to reconsider this pattern. In fact, I now often advocate hard-coding special cases when first encountering them. I have written and tested production code reading

// TODO: Hard-coded logic
if ("myGroupId".equals(group.getGroupId())) {
    // do stuff

and I’m fine with it.

The rationale comes from lean principles. You shouldn’t build something that you don’t know is needed. Especially in a startup, uncertainty is the often norm. New, very specific features are often required without full knowledge how it will fit into the grand scheme of things.

The above example code is from a case where a certain group of users needed to see a group that is hidden from others. I knew that showing / hiding groups is something we need in the future, but I wasn’t exactly sure what the conditions would be.

Instead of creating some generic mechanism for configuring group visibility, I hard-coded that case. Several months down the road, we now have more insight into what the proper logic is, and we are currently implementing that. It’s radically different from what I would have implemented at the time. Premature configurability would have been a waste of time.

Another example is a case where an item needed to be branded differently for certain customers. Instead of designing a way to configure the branding, we simply wrote an if-condition that selected between the two options. Half a year later the alternative branding was removed, and we simply removed the few lines of code. No generic logic was ever needed.

My suggestion is to hard-code cases when:

  • there is uncertainty on how the generic logic would work
  • only a single, short portion of code is affected
  • the proper, generic logic can later be introduced without affecting other code

It is imperative that hard-coded logic is replaced when more similar special cases arise. This requires good communication and understanding between management and the developers. Otherwise it may be difficult to explain why a third similar case is slower to implement than the first two.

Posted in Coding | Tagged | Leave a comment

Backup KeePass2 database on Linux

There are several instructions on how to use the KeePass2 trigger mechanism to create a backup of your password database when saving the database.  However, all of the instructions I found were for Windows.  It took a bit of figuring out what is the proper configuration on Linux, so here are the necessary steps:

  1. Select ToolsTriggers…
  2. Click Add…
  3. Type the name Backup database on save and make sure Enabled and Initially on are checked
  4. Under the Events tab click Add…, select Saving database file and click OK (ignore the conditions)
  5. Under the Actions tab click Add…, select Execure command line / URL and type in the following:
    Arguments"{DB_PATH}" "{DB_PATH}.{DT_SIMPLE}"
    Wait for exit:  checked
  6. Click OK a few times

This will create a backup of the database file every time you save it (before the save).  The backup will be in the same directory as the original, with the current timestamp appended to the name.  If you prefer to backup to a different directory, use "/path/to/backups/{DB_BASENAME}.{DT_SIMPLE}" as the second argument instead.

Posted in KeePass, Security | Tagged | 3 Comments