Testing visual appearance with Cucumber + Watir

One of the great things about Cucumber and Watir is that it allows you to write functional tests that are decoupled of the UI. By using page objects, the definition of how the UI works is decoupled from the tests themselves. If the UI changes, you only need to update the corresponding page object, and all of your tests still run.

Such tests provide an excellent safety harness in which changes can be made with the confidence of not breaking other features. The only problem is that the tests verify the functionality, but not the visuals of the pages. We were missing the safety harness for CSS changes.

For this purpose I implemented a set of tests that verify the visual appearance of certain core pages. This prevents someone from accidentally making a CSS change that affects other pages as well.

Since these tests are very brittle by definition, I do not recommend having a lot of them. You need to identify a few core pages from your application that rarely change their visual appearance, but which still cover the most important parts of your CSS.

For the impatient, the example code is on GitHub.

Visual comparison

An example of a visual comparison feature file is below:

@visual
Feature:  Visual appearance of codeforhire.com

Background:
  Given my browser resolution is 1024x600

Scenario:  Visual appearance of codeforhire.com banner
   When I open "https://codeforhire.com/"
   Then I should see the contents of "codeforhire.png"

The first thing to note is the @visual tag. For reasons explained later, you should exclude these tests from most test runs, such as when developers are running the tests themselves.

Next, in the Background you set the browser to the desired size. Since we’re developing a mobile app, we were interested in sizing the browser viewport, while Watir-Webdriver can only resize the browser window. Therefore I added a helper method that iteratively resizes the window until the viewport is of the desired size.

Finally, you set the application to the desired state, and have a post-condition of matching a specified PNG image.

Combating non-determinism

The immediate problem with this approach is non-determinism. I identified three places which can cause the resulting matching to fail: functional differences, platform differences, and random differences.

Functional differences are a result of desired behavior of your application. Maybe you display the current time or a random ad on the screen. Some of these issues can be corrected by other means, such as forcing a specific time, but others may be more difficult to work around.

Platform differences arise from varying conditions where the test is run. Different browsers render pages somewhat different, but also the same browser on different systems render differently. Things that may affect the rendering include installed fonts, display drivers, the browser version, etc.

Even if the system is exactly the same, there may be random variations in the rendering from run to run. I’ve noticed two cases where Chrome causes random variation: image colors may be very slightly off when scaling images, and SVG images may render one pixel wider or narrower from time to time.

Because of such differences, it’s not expected that the tests could be repeatably run anywhere. Running the tests should be limited to a specific computer in your continuous integration system. Remember — these are tests on a few specifically chosen pages that rarely change.

To work around these issues, I implemented two features to the image comparison:

Each pixel channel in the image may vary by one color value (COLOR_DELTA). This is sufficient at least for Chrome.
Transparent areas in the comparison image are ignored. For example, you could ignore an advertisement in the browser image.

The code also allows having a small portion of pixels differ (VISUAL_DELTA), but I prefer to keep this at zero, and manually select the areas where pixel variation is allowed.

Whenever the images differ too much, the test fails and the images are embedded in the Cucumber report and stored by our CI as build artifacts.

With these measures, the visual tests complement functional tests extremely well. While it’s unreasonable to test all CSS styling automatically, this provides a good safety measure against accidental changes that affect the core pages. And whenever an intentional change is made, we let the test fail in CI, pick the actual image, and replace the expected image with that one.