Preventing Upgrade Headaches with Selenium WebDriver

Upgrading software can be an exercise in rising pulses, clenching teeth, and immense stress. That’s where we can turn to an unexpected testing friend: Selenium.

  • Jackson Murtha
  • Sep. 05 2013
Selenium Header

If you’ve done a major software upgrade, you probably know the feeling: your pulse rises and you clench your teeth until you fearfully click the upgrade button. If you are lucky, everything goes according to plans — the anxiety ends up being your only problem — but you can never know for sure until the process is complete.

If you’ve written the software, you’ve hopefully run tests on your code, and you’re familiar enough with the changes to have confidence. However, if your code is a module, theme, front-end, or feature of some third party application, it can be very difficult to know how changes will affect your work.

Introducing Selenium

When testing my own applications, Selenium is my weapon of choice. Selenium — a web application that allows you to create functional checks in your programming language — is traditionally used to save time with BDD, integration, and regression testing during application development, yet through a bit of experimentation I’ve found Selenium can also save a lot of time and reduce problems when testing third-party software upgrades.

To efficiently use Selenium for upgrade testing, we have to take a different approach from one might use during development — one that will require abusing a few features to make more tests with a shorter lifespan. If you’re not already familiar with Selenium, you may want to read an intro with best practices for testing under normal conditions.

Our test environment will be my very basic (and fictional) Imperial Weights & Measures fan blog. The site is a very old WordPress installation with a custom theme and module, so I’ll be writing my tests in PHP, using the latest Facebook Webdriver-PHP and PHPUnit.

NOTE: There are at least five different versions of the PHP bindings for WebDriver. The official Selenium docs recommend another good php-bindings library, the Element-34 / Adam Goucher fork of the Facebook bindings. However, as of Selenium 2.34, this version is not functioning. For those interested, I’ve patched a version at github, but there are still issues. The latest Facebook bindings are the only ones I know to fully work with the latest WebDriver release.

Deciding what to test

File Directory 1

Weights and Measures directory structure

The first step in making sure that your software is fully functional after an upgrade is figuring out what it means for it to be functional. This may sound simple, but it is the most difficult part of the testing process. If you already have existing tests (especially if they are in a functional testing DSL like Gherkin), much of your work is done. If not, run through the important, easily broken, and very commonly used features of the application.

You will want a list of pages that users can visit or actions they can perform on the application through an API or URL access. For sites, a sitemap is a good place to start, if you have it. There are also a number of crawling and sitemap generation tools which will allow you to generate a list of URLs. Some of these need to be deployed in advance to log user page visits for analysis.

The following are example pages we’ll be testing for Weights and Measures. Keep these in an easy, consumable format like enter-delimited, csv, or perhaps xml or separate files if you want to source different pages for different test groups.

File: tests/test-assets/pages.txt


For web applications, examining the API access logs and the application routes may be helpful. Analytics and access statistics will also be helpful, not just for providing a list of URLs but also for helping determine their priority for testing. Of course, be sure to focus on the features and pages relevant to your particular development, module, or feature as opposed to the entire application.

On my site, for instance, I’ve made sure to include my most complicated or heavily customized areas of the site, including the search page “/?s=” and my custom form module page “/convert-your-units”. High-traffic pages (such as the index and about page) and pages with high consequences for breakage (such as login, admin-login, and feed pages) are also listed. I’ve also been careful to get a couple representative pages from each page type, such as those posts with the “2013” URL’s.

This might seem like a lot of work and unnecessary data if you don’t do a lot of testing, but it will pay off. When performing an upgrade test, you have a rare benefit of starting from a known “good” state. Any action and everything shown should (hopefully) be in working order, so you can base tests off data from the existing state.

In any case, the more examples of user actions or pages available, the more likely you are to catch a problem in the testing phase before it is apparent to the user. Spending time gathering information about the features up front will mean more automated tests can be performed, saving humans the time testing and debugging later on.

Preparing a testing environment

Once you have a virtual or cloned environment set up, you’ll need PHPUnit installed, PHP-Webdriver downloaded and included, and the Selenium server jar running.

NOTE: Let me make clear that I assume you’re performing these tests on a backed-up version of the site or application — on a virtual machine, a local development environment, development or testing server. Performing testing of this sort on a live production site would not be very helpful, as any test failures discovered would already be on the live site. Furthermore, if a vulnerability or performance bug is introduced, running a few hundred (or thousand) tests could mean the equivalent of launching a denial of service attack on your own site. Also, Some browsers require additional driver extensions and set up. See the WebDriver docs for more information.

File Directory 2

Tests directory structure for a basic project.

Next, the tests need a place to live. Most, if not all, of the tests we run will be strictly black-box, so they may not need to be alongside the application code. However, I like to create a place within the application directory in case I can save time by including existing regression tests. For Weights and Measures, I’ll make a “upgrade-tests” folder, which sits along side my “unit-tests” directory. Keeping the upgrade tests separate from my unit tests makes it easier to run one set without the other, and to discard temporary tests for the upgrade.

This brings up another big difference between traditional testing and testing in preparation for an upgrade. For most tests, it is best to think about the essential feature and to build well-designed tests that wont break with slight interface changes over the long term. For upgrade testing, most tests will be quick to write and disposable. Tests can and should be extremely brittle, because we are far more concerned with even minor unexpected changes than maintaining the tests long term.

Creating a test bootstrap script

Before settling down to write my first test, I typically write a barebones Bash script to launch my test runner from the command line and do some extra setup. (PHP’s exec is blocking, and script output is simply returned as a string on exec completion. This makes it too difficult to get real-time command line output to be worth it.) This isn’t required, as the testing language bindings all provide setup and teardown methods, but it’s a great way to introduce on the fly parameters to the test suite. In this case, we’ll use those parameters to loop through multiple times with slight variations – turning just a couple of methods into dozens or more tests.

File: tests/upgrade-test-runner

for size in 1366×768 1024x768 800x600 320x480;
    do for browser in firefox chrome htmlunit;
            do phpunit $1 --browser=$browser --dimensions=$size $2;

Like the rest of our upgrade test components, the script is quick and dirty but will save a lot of time post-upgrade. First, you’ll see two modes to be switched between at the command line: “scrape” and “test”. Scrape mode will be our greatest abuse of Selenium — and our greatest time saver. When we run with the scrape flag (before the upgrade) Selenium will run the upgrade test cases, but I’m hijacking it to store the values it finds, rather than assert and perform actual tests. This would be a poor decision if the application wasn’t in a known-good state — of if we wanted to maintain tests over time — but for the upgrade test case it’s perfect.

Writing the first tests

The type and structure of tests for verifying functionality is entirely pragmatic. Quick unit tests or isolated functionality can be helpful. Again, if you already have a set of tests drawn up, by all means include them in the test suite to be run after the upgrade. The tests developed especially for the upgrade will probably look a bit different.

Here’s the quick first set of tests for Weights and Measures:

File: tests/upgrade-tests/WebDriverUpgradeTestCase.php
class WebDriverUpgradeTestCase extends PHPUnit_Framework_testCase
    public $assetDirectory = '';
    public $originalPath = '';
    public $upgradePath = '';
    protected $webDriver;
    public $pages = array();
    public $baseUrl = '';
    public $mode = 'test';

    public function setUp() {

        // get command line arguments
        // we need this to get args going through phpunit
        global $argv;
        $options = $this->getOpts($argv);

        $dimensions = explode('x', isset($options['dimensions']) ? $options['dimensions'] : '800x600');
        $browser = isset($options['browser']) ? $options['browser'] : 'firefox';
        $webDriverHost = isset($options['webdriver-host']) ? $options['webdriver-host'] : 'http://localhost:4444/wd/hub';
        $this->mode = isset($options['mode']) && $options['mode'] === 'scrape' ? 'scrape' : 'test';

        $this->originalPath = 'test-assets/original/'.$browser.'/'.implode('x',$dimensions);
        $this->assetDirectory = $this->upgradePath = 'test-assets/upgrade/'.$browser.'/'.implode('x',$dimensions);
        if($this->mode === 'scrape') {
            $this->assetDirectory = $this->originalPath;

        // create any missing asset directories
        $assetPaths = array($this->originalPath.'/page-data', $this->originalPath.'/images',
                            $this->upgradePath.'/page-data', $this->upgradePath.'/images');
        foreach($assetPaths as $assetPath) {
            if(!file_exists($assetPath)) {
                mkdir($assetPath, 0755, true);

        $capabilities = array(WebDriverCapabilityType::BROWSER_NAME => $browser);
        $this->webDriver = new WebDriver($webDriverHost, $capabilities);

        // set window size
        $this->webDriver->manage()->window()->setSize(new \WebDriverDimension((int)$dimensions[0], (int)$dimensions[1]));

    private function getOpts($argv=null) {
        $args = $argv;
        $options = array();
        $availableOptions = array('dimensions', 'browser', 'webdriver-host', 'mode');
        foreach($args as $arg)
            foreach($availableOptions as $opt) {
                if(1 === preg_match('/^--'.$opt.'=.*/', $arg)) {
                    $argValues = explode('=', $arg);
                    $options[$opt] = $argValues[1];
        return $options;

    public function actionLogIn($user = 'test_user', $password = 'test_password') {

    public function scrapeValue($uri, $selectorType, $selector) {
        $elementText = $this->webDriver->findElement(\WebDriverBy::$selectorType($selector))->getText();
        $selectorFile = $this->originalPath.'/page-data/'.preg_replace('/[^A-Za-z0-9_-]/', '_', $uri.'-'.$selectorType.$selector);
        echo "\nNo test to complete...scraping {$elementText} to file {$selectorFile}\n";
            // output element text value to file
        file_put_contents($selectorFile, $elementText);

    public function getScrapedValue($uri, $selectorType, $selector) {
        $selectorFile = $this->originalPath.'/page-data/'.preg_replace('/[^A-Za-z0-9_-]/', '_', $uri.'-'.$selectorType.$selector);
        return trim(file_get_contents($selectorFile));

    public function assertEqualsOriginalText($uri, $selectorType, $selector) {
        if($this->mode === 'scrape') {
            return $this->scrapeValue($uri, $selectorType, $selector);
        $elementText = $this->webDriver->findElement(\WebDriverBy::$selectorType($selector))->getText();
            // dump assertion information to console, add a test listener for more robust alternative
        echo "\nAssertion ".__METHOD__.": {$selectorType}({$selector}):{$elementText}\n";
        return $this->assertSame($this->getScrapedValue($uri, $selectorType, $selector), $elementText);
    public function assertElementExists($selectorType, $selector) {
        $elementExists = false;
        $element = '';
        try {
            $element = $this->webDriver->findElement(\WebDriverBy::$selectorType($selector));
        } catch(NoSuchElementWebDriverError $error) {
            $elementExists = false;
        if($element instanceof WebDriverElement) {
            $elementExists = true;
File: tests/upgrade-tests/IndexPageTest.php

class IndexPageTest extends WebDriverUpgradeTestCase
    public function setUp() {

    public function tearDown() {

     * @test
    public function testPageTitle($uri = ‘/’) {
            $this->assertEqualsOriginalText($uri, 'cssSelector', 'h1#header a');
     * @test
    public function testConvertYourUnits() {
        $webDriver = $this->webDriver;
        $webDriver->wait(10, 500)->until(function ($webDriver) {
            return $webDriver->findElement(\WebDriverBy::linkText('convert your units'))->click();

        // there should be 419 hogsheads in 100 kiloliters
        $this->assertEquals( '419', $webDriver->findElement(\WebDriverBy::cssSelector('.hogsheads-value'))->getText() );
File: tests/upgrade-tests/AllPagesTest.php

class IndexPageTest extends WebDriverUpgradeTestCase
    public function setUp() {
        $this->pages = explode(“\n”, file_get_contents(‘./test-assets/pages.txt’));
    * @test
    public function testFooter() {
        foreach($this->pages as $page) {

    public function checkFooter($url) {
               $uri = '/';
        $selector = ' cite';
        $this->assertElementExists('cssSelector', $selector);

Notice how I’ve extended the normal classes to hack on the ‘scrape’ and ‘test’ modes, and included them in the setUp method. The Assert method is overridden so that we collect data in scrape mode and compare it to the collected data in test mode. Additionally, setUp manages the session (including the browser type, window size, and more, which are also passed in by the bootstrap script). Finally, setUp also reads the value of the pagemap we gathered earlier.

Let’s go through these tests in IndexPageTest.php and AllPagesTest.php, one by one.

  • testPageTitle() is simple and traditional looking, verifying that the page heading element contains the expected value. Note that for this test, including a simple assertEquals(‘Imperial Weights and Measures’), would have been quicker than storing and retrieving the value with the overridden scrape assert, but for this variation demonstrates that you can actually make a decent generic test that stores and retrieves the element value for the test mode assertion. We use the inherited method assertEqualsOriginalText(), which stores the known good value when we use the ‘scrape’ flag and compares against that value when we’re running the real test.
  • testConvertYourUnits() shows a long, complicated procedural example.Longer functionality tests that require many steps to perform are normally very slow to run and difficult to keep up to date — not good for everyday Selenium-based tests — but you can save time by including a number of procedural steps in your case when testing an upgrade, since the tests will be discarded anyway. This test is very delicate, because it is actually testing many different steps — login with username and password, click of a link, selection of a button, input of text, click of a checkbox, and finally verification of a calculation — but it only needs to survive one upgrade, and the function should remain identical, so a long test works well.That said, the goal is to not waste time writing tests, so be sure to extend test classes, utilize set up methods, and isolate functionality that prepares test functions to be run if you will be using them for multiple assertions. This is why I’ve isolated functions like actionLogIn() for reuse in multiple tests.
  • testFooter() can be run over all the pages in the sitemap, so it’s been written in a very simplified way.While the first two tests are reused by the bootstrap script dozens of times for different browsers, resolutions and other adjustable factors, a generic test like this can produce hundreds of test assertions over the site. It might seem like overkill to test for repeated items like a footer on each page, but the rare exception where a repeated item misbehaves on a small subset of pages is much better found automatically and quickly than in the production environment after the upgrade.I’m using a loop to run checkFooter() over all the urls listed in the $pages attribute array. While loops and conditionals should be avoided for most tests, I find they’re more likely to be worth the tradeoffs for the throwaway upgrade tests. Note that this test will fail on the first broken assertion in the loop. On the upside, it is extremely quick to write this way, and it is that this is less annoying if your footer breaks everywhere (or nowhere) after the upgrade. The downside is that it can be more difficult for you to determine what caused the test to fail, since each assertion is run under the same test name. For simplicity’s sake, I’m just using echo to indicate which test last ran. It’s much better, and not much harder, to implement a test listener to output this information and store it in the PHPUnit xml report. (An alternative approach I have used is to dispatch urls to tests in the bootstrap script to force PHPUnit to create a separate assertion for each tests without having to create separate test methods.)

Using image processing in tests

One last great feature of Selenium for upgrade testing is the screenshot. Typically, Selenium’s screenshot functionality is used in test failure reporting to gather what a page looked like during a failed assertion. However, for upgrade testing the screenshot is used to make test assertions. I’ll be extending the screenshot functionality a little bit with ImageMagick, so if you’re following along in PHP, you will need ImageMagick and the PHP Bindings for ImageMagick installed.

These sorts of tests may not work well if your upgrade provides theme changes (with a little image processing to crop sections of the page, they can still be quite helpful if you know what to expect), but Weights and Measures uses a custom theme on the public side, so nothing should change. I want to guarantee that everything is still perfectly crafted down to the pixel on the site, so I’ve written this small class:

File: tests/upgrade-tests/WebDriverUpgradeTestCase.php

class WebDriverUpgradeTestCase extends PHPUnit_Framework_testCase
    public function assertEqualsOriginalFullScreenshot($threshold = 0) {
        if($this->mode === 'scrape') {
            return $this->scrapeFullScreenshot();
        $uri = $this->webDriver->getCurrentUrl();
        $imageSubPath = '/images/full-'.preg_replace($this->assetReplaceRegex, '_', $uri);
        $originalImg = new imagick($this->originalPath. $imageSubPath . '.png');
        $upgradeImgPath = $this->upgradePath. $imageSubPath .'.png';

        // take a screenshot with WebDriver and load it into an ImageMagick object
        $upgradeImg = new imagick($upgradeImgPath);

        // generate a comparison between upgrade and original screenshot
        $compareResult = $originalImg->compareImages($upgradeImg, Imagick::METRIC_MEANSQUAREERROR);
        // save the image diff, just in case we want to view it after a failure
        file_put_contents($this->upgradePath.$imageSubPath.'-diff.png', $compareResult[0]);

        // make sure that the numerical difference between our images is below our given threshold
        $this->assertLessThanOrEqual($threshold, $compareResult[1]);

    public function scrapeFullScreenshot() {
        $uri = $this->webDriver->getCurrentUrl();
        $imgFilename = $this->originalPath. '/images/full-'.preg_replace($this->assetReplaceRegex, '_', $uri).'.png';

    public function getScrapedFullScreenshot() {
        $uri = $this->webDriver->getCurrentUrl();
        $imgFilename = $this->originalPath. '/images/full-'.preg_replace($this->assetReplaceRegex, '_', $uri).'.png';
        return $imgFilename;
File: tests/upgrade-tests/IndexPageTest.php

class IndexPageTest extends WebDriverUpgradeTestCase
     * @test
    public function testIndexScreenshot() {
        $uri = '/';

As with the footer test, these screenshot tests actually spawn all the page map pages without special, separate tests. For these tests, I’m also taking and storing the screenshots in ‘scrape’ mode (unique for each browser and each resolution) for later comparison. There are a couple of ways to compare the images to verify that the site remains the same after the upgrade. The simplest would be simply converting the image binaries’ base64-encoded strings and using PHPUnit’s assertEquals() method to ensure they are identical. (These are the strings output by webdriver-php’s takeScreenshot() in most of the php bindings, but if you end up with an image in your binding, you can read the file to create the encoded string.)

The alternative I’ve implemented is matching up the two images and have imagemagick perform a visual image diff. This is implemented in WebDriverUpgradeTestCase::assertEqualsOriginalFullScreenshot(), which stores the screenshot in scrape mode, but generates a difference image (see figure below). I’ve used the inherited PHPUnit assertLessThanOrEqual() to allow me to pass in a threshold to determine in each test what level of difference is acceptable. The default value 0 means that the original and post-upgrade browser screenshots must be identical.

Blog test example

As you can see, the difference between the two images is shown as a red blob.

Image comparisons are extremely brittle and therefore best avoided for normal Selenium tests, but if your upgraded application theme should remain visually consistent, they can be extremely helpful. Slight changes in the look of your application can cause these tests to fail, however, so be mindful of even small expected visual changes when designing these.

For instance, say I know that the copyright year in the page footer is changing after the upgrade, and also I have an advertising block which features random content that will not appear the same for each test. To do this, I would simply modify the test to include an ImageMagick transform which draws a solid block over the pixel areas that I don’t want to consider for comparison. In scrape mode, I would save the screenshot after the transform, and in test mode I would do the same before the screenshot compare assertion. This allows testing these areas with screenshots without unnecessary failures from dynamic content or expected changes from the upgrade. Additionally, I can still write more fitting tests targeting content not covered by the image comparison tests, so I make sure that the advertisement and copyrights appear as expected.

If I hadn’t anticipated a smaller change, such as an element in the footer, I could cause a lot of tests to fail. By using a diff threshold test, I can take provide a little leeway for very small differences, and I can see that the threshold is nearly the same quantity across all pages, even if the tests ultimately fail.

If this were a normal test suite that I intended to reuse time and again, I should rewrite this test to make it more flexible. Even for upgrade purposes, I should rewrite and run the test if it the reason for failure is ambiguous, or if it was widespread across the site. In this case, however, the failure is fully explained and I have manually confirmed that the result is desired, so there is no need to alter or re-run the test suite if I intend to discard it.

Overcoming (some) good testing habits

By now, you’ve probably noticed a theme which might sound distasteful if you have a lot of testing experience: we’re ignoring the rules.

Since the arrival of Selenium WebDriver, I’ve been a heavy supporter of the PageObject model which separates out components of the page so that they are easier to maintain as pages change. Here, that model adds extra layers which are more difficult to loop through, and it is a waste of time to account for application changes which shouldn’t occur over the brief lifespan of our tests. Instead, all of our tests have been quick, dirty, and — above all — pragmatic.

For this reason, Selenium IDE also works very well for authoring quick tests in the browser to be added to your upgrade test suite, though I normally avoid it for long-term tests. (If using Selenium IDE, be sure to use the new WebDriver formatters for writing tests. As of the writing, these formatters do not yet support the rewritten Facebook PHP WebDriver bindings.)

If you’re an experienced tester, you may think the whole idea of testing a third-party upgrade is a bit preposterous. We’ve learned to avoid testing others’ code as a rule. Appropriate test coverage, the theory goes, should ensure your portion of the application’s functionality — and testing vendor-provided code is chasing an impossible target.

However, testing upgrades is an exercise in pragmatism, and regardless of any principled arguments about where the domain of testing ends, upgrades can break things — which becomes a problem when your application still needs to work.

This isn’t a license for bad testing practices in general, though. If you stumble upon a feature or unit of code which is stable, required, and without a test, taking the time to write good tests and including them in your integration or unit test suite will save you time in the long run. Don’t waste time writing a disposable test for application components with a long lifespan. The upgrade testing process may present you with a perfect opportunity for an audit of your existing tests.

Understanding upgrade tests

My experiments testing upgrades with Selenium have been helpful in reducing anxiety when upgrading client software. They can also be helpful in saving a lot of time over a completely manual approach to testing after an upgrade, if a few differences from traditional browser testing are kept in mind. There is still some experimenting to do to get the best efficiency out of the process, but I’ve distilled the important conceptual differences between upgrade and traditional in-browser testing to the following comparison:

Upgrade Tests

  • tests third party code, including how interacts with your code
  • can be brittle, but is quick to write
  • correct function may be assumed in existing site before upgrade
  • no need to be quick to run; can be infrequently run
  • integration testing — not isolated
  • test “good” values often generated by Selenium
  • reusable across many pages
  • tests are longer and more procedural
  • failed tests may not require rewriting if results are acceptable

Traditional Automated Browser Testing

  • tests only (or primarily) your own code
  • stable, reliable over time, and well thought-out
  • correct function defined in test
  • quick to run and frequently run
  • well-isolated
  • test “good” values determined manually
  • reusable over time
  • tests are shorter, simpler and isolated
  • failed tests must be passed, rewritten, or removed