Entropy of a MongoDB ObjectId

A MongoDB ObjectId is a globally unique ID that can be generated on any machine with high probability of it being distinct from any other ObjectId generated elsewhere. The contents is not random, but consists of a few pieces of data. While you should never rely on knowing an ObjectId value for security, there are cases where it’s important to understand how difficult it is to guess a generated ObjectId.

Let’s consider a simpler case: IPv4 addresses. For privacy reasons, you might not want to store real IP addresses, but you still want to count the number of distinct IP’s used. The trivial solution is to take a SHA1 hash of the IP address and store those. The problem is that this is not really a one-way hash, because the source range is so limited. A modern GPU will test through all possible IPv4 addresses in a fraction of a second.

In this case the source range of IPv4 contains only 32 bits of entropy. So how many bits of entropy does a MongoDB ObjectId contain?

Continue reading

Advertisements
Posted in MongoDB, Security | Tagged , , | Leave a comment

ISO certification for dummies

Wellmo recently received its ISO 27001 certificate, the primary standard for an information security management system (ISMS).  I was closely involved in defining the processes and policies to obtain the certificate, and will share some of my experiences in this post. Continue reading

Posted in Certification, ISO 27001, Security | Tagged , | Leave a comment

Complex plurals

Pluralization is a key topic in localization, but its complexity is very often overlooked.  Every localization library supports singular and plural forms, but many languages have more than two plural forms — some even have six different forms!

We encountered this when localizing Wellmo to Polish.  Polish has four different plural forms (below are three, the fourth is used in decimals):

0 steps = 0 kroków
1 step = 1 krok
2 steps = 2 kroki
3 steps = 3 kroki
4 steps = 4 kroki
5 steps = 5 kroków
6 steps = 6 kroków

Getting pluralization correct for all languages poses a challenge, and more often than not, it is done wrong.

Continue reading

Posted in Coding, Localization | Tagged , , | Leave a comment

Continuous localization

In a lot of development (especially outside the US), localization is a must.  You can have people 100 km away that don’t speak the same language.  Unfortunately localization is often a significant hindrance in agile development and continuous deployment.

In this post, I’ll describe how at Wellmo we set up a system where translation is a short, relatively easy step a bit before deployment.

Continue reading

Posted in Coding, Localization | Tagged , , , , | 1 Comment

Benchmarking AWS t2 instances

A couple of years ago I did some benchmarking on the AWS t1.micro instances.  The t1.micro was AWS’s first performance burstable instance type, which are appropriate for non-continuous CPU usage.

The CPU allocation of the t1.micro is not specified anywhere.  In practice the throttling is quite harsh.  If you have a CPU hungry process that takes over 10 seconds to run, it will be throttled.

The t2 instance family CPU allocation is clearly defined using CPU credits.  You acquire CPU credits when idle and spend them when busy.  The credits accumulate up to 24 hours.  This allows a daily-run CPU hungry process 72 minutes of full CPU power even on the measly t2.nano.

I wrote a simple program to test this functionality.  It performs floating point computations and every 10 seconds reports how many rounds it got through.

Below is a graph of the results for t2.nano, t2.micro and t2.small instances.  CPU usage is scaled to the maximum 10 second performance of each instance.

results

As a comparison, below is the corresponding results for a 30-minute run on a t1.micro with 1 second resolution.  The peaks are 10-15 seconds in length.

results-old

Before the test, the t2 instances were put under full load until the initial CPU credits had been spent, and then let be idle for 24 hours in order to accumulate full CPU credits.

The t2.micro was about 11% faster than the other two, as it happened to be launched on a 2.5 GHz Intel Xeon E5-2670 v2, while the other two were on a 2.4 GHz Intel Xeon E5-2676 v3.  The absolute performance will depend on which type of machine you happen to get.  With this performance test, the speed was approximately the same as a single core on my work laptop.

The t2.micro also experienced more random variability during the high load phase.  These were short variations, and all instances averaged 98-99% of the max CPU usage during the time.

The throttled performance was a very steady 5 / 10 / 20% of the maximum performance, with the exception of an additional speed boost of 5 minutes every 40 minutes.  No idea why that happens.

There’s a slight step down when CPU credits are exhausted before full throttling takes place.  It’s most prominent with the t2.nano, where the CPU dropped to 17% for 5 minutes before dropping to the base level of 5%.  The corresponding values were 5 minutes at 15% and 10 minutes at 25% for the t2.micro and t2.small, respectively.

Overall, the t2 instances provide much better predictability, control and flexibility on CPU performance than the t1.micro.  The t2 family is very well suited for general use, handling both continuous small bursts, and occasional longer bursts of activity gracefully.

Posted in Amazon AWS | Tagged , | 1 Comment

When to hard-code

Hard-coding is generally considered an anti-pattern and abhorred by experienced developers. Input and configuration data should be externalized from the code, or at the very least parametrized to a language constant.

While working at Wellmo, I’ve come to reconsider this pattern. In fact, I now often advocate hard-coding special cases when first encountering them. I have written and tested production code reading

// TODO: Hard-coded logic
if ("myGroupId".equals(group.getGroupId())) {
    // do stuff
}

and I’m fine with it.

The rationale comes from lean principles. You shouldn’t build something that you don’t know is needed. Especially in a startup, uncertainty is the often norm. New, very specific features are often required without full knowledge how it will fit into the grand scheme of things.

The above example code is from a case where a certain group of users needed to see a group that is hidden from others. I knew that showing / hiding groups is something we need in the future, but I wasn’t exactly sure what the conditions would be.

Instead of creating some generic mechanism for configuring group visibility, I hard-coded that case. Several months down the road, we now have more insight into what the proper logic is, and we are currently implementing that. It’s radically different from what I would have implemented at the time. Premature configurability would have been a waste of time.

Another example is a case where an item needed to be branded differently for certain customers. Instead of designing a way to configure the branding, we simply wrote an if-condition that selected between the two options. Half a year later the alternative branding was removed, and we simply removed the few lines of code. No generic logic was ever needed.

My suggestion is to hard-code cases when:

  • there is uncertainty on how the generic logic would work
  • only a single, short portion of code is affected
  • the proper, generic logic can later be introduced without affecting other code

It is imperative that hard-coded logic is replaced when more similar special cases arise. This requires good communication and understanding between management and the developers. Otherwise it may be difficult to explain why a third similar case is slower to implement than the first two.

Posted in Coding | Tagged | Leave a comment

Backup KeePass2 database on Linux

There are several instructions on how to use the KeePass2 trigger mechanism to create a backup of your password database when saving the database.  However, all of the instructions I found were for Windows.  It took a bit of figuring out what is the proper configuration on Linux, so here are the necessary steps:

  1. Select ToolsTriggers…
  2. Click Add…
  3. Type the name Backup database on save and make sure Enabled and Initially on are checked
  4. Under the Events tab click Add…, select Saving database file and click OK (ignore the conditions)
  5. Under the Actions tab click Add…, select Execure command line / URL and type in the following:
    File/URLcp
    Arguments"{DB_PATH}" "{DB_PATH}.{DT_SIMPLE}"
    Wait for exit:  checked
  6. Click OK a few times

This will create a backup of the database file every time you save it (before the save).  The backup will be in the same directory as the original, with the current timestamp appended to the name.  If you prefer to backup to a different directory, use "/path/to/backups/{DB_BASENAME}.{DT_SIMPLE}" as the second argument instead.

Posted in KeePass, Security | Tagged | 1 Comment