Code for Hire

Reverse-engineering old jewelry

Posted on October 23, 2021 by Sampo J.

Why was I up at 00:30 reading an ECMA spec from 1970 and a report from the Union Carbide Corp Nuclear Division? Because I found an old geeky bracelet from our summer cottage and of course I had to decipher it.

My first thought was naturally binary ASCII. A back-of-my-head consideration showed it had potential, but there were strange values interspersed (for example control character 0x10 DLE which I have never heard of). ASCII conversion produced “2yh#” – not very promising. EBCDIC didn’t yield any better result.

Then something stirred in my head, and I realized from the line of small dots that it’s of course punch tape. That lead me to read up on punch tape formats, including the ECMA-10 standard for Data Interchange on Punched Tape from 1970. The problem was that practically all of the standards defined ASCII as the character encoding, which was already ruled out.

However, it did contain the next clue: The ECMA-10 standard defines that the rows include a parity bit, meaning that each row contains an even number of holes (i.e. even parity). The bracelet contained an odd number of “holes” on each row – therefore I had to search for a punch tape standard that used odd parity.

The decisive clue came from a post on a CNC machining forum. It referred to EIA/RS-244 coded tape – often used in CNC machines – having an odd number of holes on every row. Googling that took me to the Union Carbide Corp Nuclear Division report “Punched-tape code and format for numerically controller machines” from 1969.

On page 9 it contains the conversion from punched holes to letters and numbers. I was ecstatic when I started getting letter after letter from the code.

So what does it say? “CHED TAPE. THIS ” The first word might be truncated from “PUNCHED TAPE”. I’m unsure what “this” and the space indicate. Perhaps that was all that fit on the bracelet, or maybe there was a pair where the message continued?

EDIT: I originally decoded it as CHAD TAPE. THIS where chad tape is punched tape used in telegraphy/teletypewriter operation (chad being the paper fragments coming from punching holes in the tape). As John noted in the comments, the third letter is actually an E instead of A.

My father-in-law was unsure of the bracelet’s origin, but thought it might have been made as a nostalgia piece in the 1980’s in the University of Helsinki physics department where punch tape had been in common use.

Posted in Geeky | Tagged Geeky, Jewelry, Nostalgia, Punch tape | 2 Comments

Brace placement

Posted on October 23, 2021 by Sampo J.

Since the C language appeared in the 70’s, holy wars have been raged about how the code should be formatted. Since then, these disputes have inherited also to languages borrowing the C-esque language style, such as Java, JavaScript and C#.

Most new languages come with a specified style guideline, which promotes consistency and collaboration. Sun attempted this by publishing the Code Conventions for the Java Programming Language. However, they had the burden of decades of existing practice, and so the debate continued.

One of the biggest questions is the positioning of braces: should they be on a separate line or on the same line as the statement. This is largely a matter of personal taste: some like sparser code, some denser.

I have come across one case which makes a clear distinction to the benefit of one convention. It’s not a common case, but I have come across it in production code.

Continue reading →

Posted in Coding, Style | Tagged coding, Style | Leave a comment

Why you need a localization tester tool

Posted on December 2, 2018 by Sampo J.

I’ve worked with four different translation agencies in different projects. In each case, we have occasionally received some faulty translations, where for example placeholders were missing or incorrectly formatted. This has occurred even when the translator has used a translation platform which warns about missing translation placeholders.

As with other quality aspects, you are ultimately responsible, and automation is key to quality.

Therefore in each project I’ve worked on I’ve built a tool that performs basic checks on the translation files. Checks include:

The file format is correct
All translation files contain exactly the same keys and no duplicate keys
Same placeholders and HTML tags are present in the translations as the master (make sure that {foo}} also causes an error)
Correct pluralization forms are included per language (if complex pluralization is used)
No master or translation texts contain “TODO” or “FIXME”

The last point allows placing temporary localization keys in the master or translation files without worrying that they’re accidentally left in.

Our CI is configured to run the localization tests only in the master branch, not development branches. This allows development to be done while texts and translations are not yet complete, but localization files need to be correct before deployment.

This works particularly well together with a continuous localization system.

Posted in Continuous integration, Localization, Uncategorized | Tagged Localization, testing | Leave a comment

UX Fails: Mac OS app switcher (and how to fix it)

Posted on March 25, 2018 by Sampo J.

A year ago, I did the switch from Linux to Mac OS at work. While the initial couple of weeks were painful, I’ve gradually grown to love the Mac environment. It’s a good balance of a Unix-like development environment with ease of use and commercial supported software.

But no matter how praised Apple’s design is, they have failed in some areas. The Mac OS app switcher is one of them. A lot can be said on how an app switcher should work, but there are a couple of clear cock-ups in the Mac OS flavor.

Continue reading →

Posted in Usability, UX, UX Failure | Tagged Mac OS, UX, UX Failure | Leave a comment

Swearing in Unicode

Posted on March 12, 2018 by Sampo J.

Sometimes you just gotta swear, but don’t have the words for it. This is where grawlixes, or symbol swearing, comes in.

For text-only locations such as chats, Unicode offers a wide variety of characters to use. Unfortunately, the standard keyboard is not very well designed for versatile grawlix swearing. Furthermore, many browsers have started to render even previously “normal” characters as colored emoji characters. Mixing regular black characters with colored emoji may not produce the best results.

I tried to find a good set of grawlix characters from the Unicode character set and did a quick test which ones are rendered as emoji by common browsers and which aren’t. The results are below (YMMV):

Regular:	#@?!§%&❢☆★✩
Depends on browser:	⚡‼⁉☁✔✖❣
Emoji:	🌟🔪🔫✊🌀⛈🌧🌩🔨💣❗👊🗡☠💀

Based on this I’ve provided a couple of ready options to copy-paste for your swearing needs:

#%@★❢

☠⚡🔪🌟💣‼

🌩🔫💣✊🗡💀⁉

🌟🌧🌀🔨☠❗

Do you have any more characters to suggest?

Posted in Unicode | Tagged swearing, Unicode | 1 Comment

Entropy of a MongoDB ObjectId

Posted on October 7, 2017 by Sampo J.

A MongoDB ObjectId is a globally unique ID that can be generated on any machine with high probability of it being distinct from any other ObjectId generated elsewhere. The contents is not random, but consists of a few pieces of data. While you should never rely on knowing an ObjectId value for security, there are cases where it’s important to understand how difficult it is to guess a generated ObjectId.

Let’s consider a simpler case: IPv4 addresses. For privacy reasons, you might not want to store real IP addresses, but you still want to count the number of distinct IP’s used. The trivial solution is to take a SHA1 hash of the IP address and store those. The problem is that this is not really a one-way hash, because the source range is so limited. A modern GPU will test through all possible IPv4 addresses in a fraction of a second.

In this case the source range of IPv4 contains only 32 bits of entropy. So how many bits of entropy does a MongoDB ObjectId contain?

Continue reading →

Posted in MongoDB, Security | Tagged mongodb, ObjectId, Security | Leave a comment

ISO certification for dummies

Posted on July 4, 2017 by Sampo J.

Wellmo recently received its ISO 27001 certificate, the primary standard for an information security management system (ISMS). I was closely involved in defining the processes and policies to obtain the certificate, and will share some of my experiences in this post. Continue reading →

Posted in Certification, ISO 27001, Security | Tagged Certification, ISO 27001 | 1 Comment

Complex plurals

Posted on March 21, 2017 by Sampo J.

Pluralization is a key topic in localization, but its complexity is very often overlooked. Every localization library supports singular and plural forms, but many languages have more than two plural forms — some even have six different forms!

We encountered this when localizing Wellmo to Polish. Polish has four different plural forms (below are three, the fourth is used in decimals):

0 steps = 0 kroków
1 step = 1 krok
2 steps = 2 kroki
3 steps = 3 kroki
4 steps = 4 kroki
5 steps = 5 kroków
6 steps = 6 kroków
…

Getting pluralization correct for all languages poses a challenge, and more often than not, it is done wrong.

Continue reading →

Posted in Coding, Localization | Tagged internationalization, Localization, pluralization | 1 Comment

Continuous localization

Posted on February 2, 2016 by Sampo J.

In a lot of development (especially outside the US), localization is a must. You can have people 100 km away that don’t speak the same language. Unfortunately localization is often a significant hindrance in agile development and continuous deployment.

In this post, I’ll describe how at Wellmo we set up a system where translation is a short, relatively easy step a bit before deployment.

Continue reading →

Posted in Coding, Localization | Tagged continuous localization, Lionbridge, Localization, OneSky, Transfluent | 2 Comments

Benchmarking AWS t2 instances

Posted on January 9, 2016 by Sampo J.

A couple of years ago I did some benchmarking on the AWS t1.micro instances. The t1.micro was AWS’s first performance burstable instance type, which are appropriate for non-continuous CPU usage.

The CPU allocation of the t1.micro is not specified anywhere. In practice the throttling is quite harsh. If you have a CPU hungry process that takes over 10 seconds to run, it will be throttled.

The t2 instance family CPU allocation is clearly defined using CPU credits. You acquire CPU credits when idle and spend them when busy. The credits accumulate up to 24 hours. This allows a daily-run CPU hungry process 72 minutes of full CPU power even on the measly t2.nano.

I wrote a simple program to test this functionality. It performs floating point computations and every 10 seconds reports how many rounds it got through.

Below is a graph of the results for t2.nano, t2.micro and t2.small instances. CPU usage is scaled to the maximum 10 second performance of each instance.

As a comparison, below is the corresponding results for a 30-minute run on a t1.micro with 1 second resolution. The peaks are 10-15 seconds in length.

Before the test, the t2 instances were put under full load until the initial CPU credits had been spent, and then let be idle for 24 hours in order to accumulate full CPU credits.

The t2.micro was about 11% faster than the other two, as it happened to be launched on a 2.5 GHz Intel Xeon E5-2670 v2, while the other two were on a 2.4 GHz Intel Xeon E5-2676 v3. The absolute performance will depend on which type of machine you happen to get. With this performance test, the speed was approximately the same as a single core on my work laptop.

The t2.micro also experienced more random variability during the high load phase. These were short variations, and all instances averaged 98-99% of the max CPU usage during the time.

The throttled performance was a very steady 5 / 10 / 20% of the maximum performance, with the exception of an additional speed boost of 5 minutes every 40 minutes. No idea why that happens.

There’s a slight step down when CPU credits are exhausted before full throttling takes place. It’s most prominent with the t2.nano, where the CPU dropped to 17% for 5 minutes before dropping to the base level of 5%. The corresponding values were 5 minutes at 15% and 10 minutes at 25% for the t2.micro and t2.small, respectively.

Overall, the t2 instances provide much better predictability, control and flexibility on CPU performance than the t1.micro. The t2 family is very well suited for general use, handling both continuous small bursts, and occasional longer bursts of activity gracefully.

Posted in Amazon AWS | Tagged Amazon AWS, performance | 1 Comment

Code for Hire

Reverse-engineering old jewelry

Brace placement

Why you need a localization tester tool

UX Fails: Mac OS app switcher (and how to fix it)

Swearing in Unicode

Entropy of a MongoDB ObjectId

ISO certification for dummies

Complex plurals

Continuous localization

Benchmarking AWS t2 instances

Recent Posts

Top Posts & Pages

Tags