Complex plurals

Pluralization is a key topic in localization, but its complexity is very often overlooked.  Every localization library supports singular and plural forms, but many languages have more than two plural forms — some even have six different forms!

We encountered this when localizing Wellmo to Polish.  Polish has four different plural forms (below are three, the fourth is used in decimals):

0 steps = 0 kroków
1 step = 1 krok
2 steps = 2 kroki
3 steps = 3 kroki
4 steps = 4 kroki
5 steps = 5 kroków
6 steps = 6 kroków

Getting pluralization correct for all languages poses a challenge, and more often than not, it is done wrong.

Continue reading

Posted in Coding, Localization | Tagged , , | Leave a comment

Continuous localization

In a lot of development (especially outside the US), localization is a must.  You can have people 100 km away that don’t speak the same language.  Unfortunately localization is often a significant hindrance in agile development and continuous deployment.

In this post, I’ll describe how at Wellmo we set up a system where translation is a short, relatively easy step a bit before deployment.

Continue reading

Posted in Coding, Localization | Tagged , , , , | 1 Comment

Benchmarking AWS t2 instances

A couple of years ago I did some benchmarking on the AWS t1.micro instances.  The t1.micro was AWS’s first performance burstable instance type, which are appropriate for non-continuous CPU usage.

The CPU allocation of the t1.micro is not specified anywhere.  In practice the throttling is quite harsh.  If you have a CPU hungry process that takes over 10 seconds to run, it will be throttled.

The t2 instance family CPU allocation is clearly defined using CPU credits.  You acquire CPU credits when idle and spend them when busy.  The credits accumulate up to 24 hours.  This allows a daily-run CPU hungry process 72 minutes of full CPU power even on the measly t2.nano.

I wrote a simple program to test this functionality.  It performs floating point computations and every 10 seconds reports how many rounds it got through.

Below is a graph of the results for t2.nano, t2.micro and t2.small instances.  CPU usage is scaled to the maximum 10 second performance of each instance.


As a comparison, below is the corresponding results for a 30-minute run on a t1.micro with 1 second resolution.  The peaks are 10-15 seconds in length.


Before the test, the t2 instances were put under full load until the initial CPU credits had been spent, and then let be idle for 24 hours in order to accumulate full CPU credits.

The t2.micro was about 11% faster than the other two, as it happened to be launched on a 2.5 GHz Intel Xeon E5-2670 v2, while the other two were on a 2.4 GHz Intel Xeon E5-2676 v3.  The absolute performance will depend on which type of machine you happen to get.  With this performance test, the speed was approximately the same as a single core on my work laptop.

The t2.micro also experienced more random variability during the high load phase.  These were short variations, and all instances averaged 98-99% of the max CPU usage during the time.

The throttled performance was a very steady 5 / 10 / 20% of the maximum performance, with the exception of an additional speed boost of 5 minutes every 40 minutes.  No idea why that happens.

There’s a slight step down when CPU credits are exhausted before full throttling takes place.  It’s most prominent with the t2.nano, where the CPU dropped to 17% for 5 minutes before dropping to the base level of 5%.  The corresponding values were 5 minutes at 15% and 10 minutes at 25% for the t2.micro and t2.small, respectively.

Overall, the t2 instances provide much better predictability, control and flexibility on CPU performance than the t1.micro.  The t2 family is very well suited for general use, handling both continuous small bursts, and occasional longer bursts of activity gracefully.

Posted in Amazon AWS | Tagged , | 1 Comment

When to hard-code

Hard-coding is generally considered an anti-pattern and abhorred by experienced developers. Input and configuration data should be externalized from the code, or at the very least parametrized to a language constant.

While working at Wellmo, I’ve come to reconsider this pattern. In fact, I now often advocate hard-coding special cases when first encountering them. I have written and tested production code reading

// TODO: Hard-coded logic
if ("myGroupId".equals(group.getGroupId())) {
    // do stuff

and I’m fine with it.

The rationale comes from lean principles. You shouldn’t build something that you don’t know is needed. Especially in a startup, uncertainty is the often norm. New, very specific features are often required without full knowledge how it will fit into the grand scheme of things.

The above example code is from a case where a certain group of users needed to see a group that is hidden from others. I knew that showing / hiding groups is something we need in the future, but I wasn’t exactly sure what the conditions would be.

Instead of creating some generic mechanism for configuring group visibility, I hard-coded that case. Several months down the road, we now have more insight into what the proper logic is, and we are currently implementing that. It’s radically different from what I would have implemented at the time. Premature configurability would have been a waste of time.

Another example is a case where an item needed to be branded differently for certain customers. Instead of designing a way to configure the branding, we simply wrote an if-condition that selected between the two options. Half a year later the alternative branding was removed, and we simply removed the few lines of code. No generic logic was ever needed.

My suggestion is to hard-code cases when:

  • there is uncertainty on how the generic logic would work
  • only a single, short portion of code is affected
  • the proper, generic logic can later be introduced without affecting other code

It is imperative that hard-coded logic is replaced when more similar special cases arise. This requires good communication and understanding between management and the developers. Otherwise it may be difficult to explain why a third similar case is slower to implement than the first two.

Posted in Coding | Tagged | Leave a comment

Backup KeePass2 database on Linux

There are several instructions on how to use the KeePass2 trigger mechanism to create a backup of your password database when saving the database.  However, all of the instructions I found were for Windows.  It took a bit of figuring out what is the proper configuration on Linux, so here are the necessary steps:

  1. Select ToolsTriggers…
  2. Click Add…
  3. Type the name Backup database on save and make sure Enabled and Initially on are checked
  4. Under the Events tab click Add…, select Saving database file and click OK (ignore the conditions)
  5. Under the Actions tab click Add…, select Execure command line / URL and type in the following:
    Arguments"{DB_PATH}" "{DB_PATH}.{DT_SIMPLE}"
    Wait for exit:  checked
  6. Click OK a few times

This will create a backup of the database file every time you save it (before the save).  The backup will be in the same directory as the original, with the current timestamp appended to the name.  If you prefer to backup to a different directory, use "/path/to/backups/{DB_BASENAME}.{DT_SIMPLE}" as the second argument instead.

Posted in KeePass, Security | Tagged | 1 Comment

Hybrid app testing using dynamic DNS

Hybrid apps simplify implementing cross-platform mobile applications in many ways.  You only need to write the HTML once, and it should work on all platforms.  However, you still need to test those platforms.

At Wellmo, we do most of our development on local environments using a browser, and test the functionality on phones afterwards.  This poses a problem:  We have a bunch of test devices, but each time you’d want to test on one, you need to install a new native client pointing to the appropriate environment (often your own local environment).  What’s worse is that iOS applications can be developed / deployed only on a Mac and Windows Phone applications only on Windows.

We solved this issue by using dynamic DNS addresses for each device, which allows any developer to point any device at any environment with a single command.

Continue reading

Posted in Hybrid apps, Testing | Tagged , , | Leave a comment

Using Spark with MongoDB

I recently started investigating Apache Spark as a framework for data mining. Spark builds upon Apache Hadoop, and allows a multitude of operations more than map-reduce. It also supports streaming data with iterative algorithms.

Since Spark builds upon Hadoop and HDFS, it is compatible with any HDFS data source. Our server uses MongoDB, so we naturally turned to the mongo-hadoop connector, which allows reading and writing directly from a Mongo database.

However, it was far from obvious (at least for a beginner with Spark) how to use and configure mongo-hadoop together with Spark. After a lot of experimentation, frustration, and a few emails to the Spark user mailing list, I got it working in both Java and Scala. I wrote this tutorial to save others the exasperation.

Read below for details. The impatient can just grab the example application code.

Continue reading

Posted in Coding, MongoDB, Spark | Tagged , | 31 Comments