October 31, 2014

Test Execution in Distributed Environment using ZooKeeper – A POC

In this blog I will be taking you through a proof of concept which I have done to leverage ZooKeeper’s configuration management service to run different (or same) automated tests in parallel on different test machines in a network.

Before starting with the POC, let’s do a quick overview of ZooKeeper and some of its features which we require to understand the POC.

Introduction to Zookeeper

Apache ZooKeeper is an open source co-ordinated service for distributed applications. It runs in java and provides APIs in Java and C. These APIs helps in designing synchronization, configuration management and grouping and naming services in a distributed application.

While designing a distributed application we have to worry about centralized configuration, synchronization, serialization. Also, there are problems that are intrinsic to distributed applications –e.g. Partial failures, Race conditions, handling of sessions timeouts retries etc. ZooKeeper is there to take care of all these difficult stuff, so that one can focus on functionality.

ZooKeeper service runs on odd number of network machines. This is called ZooKeeper cluster (or ensemble ). Once running, the ZooKeeper service will choose a leader. Both leaders and followers can get connected to clients.


Fig. 1 Physical architecture of ZooKeeper

From a very high level think of ZooKeeper as a centralized repository where our distributed applications can store data and read data. Reads can be from any of the ZooKeeper server nodes (whether leader or follower). But all the writes go through the leader.

Inside a ZooKeeper i.e ZooKeeper’s data model is a hierarchical namespace like a filesystem with just directories. These directories are called Znodes. Znodes can store data (only upto 1 MB) and other Znodes. The reason why znode can store 1 MB of data because it’s not designed to work as a database, all we need to store is small amount of information to share between distributed application.


Fig.2 ZooKeeper Data Model

We can create four types of znodes –

Persistent – These znodes always remains there. We don’t want that node to leave unless explicitly deleted.

Ephemeral – These are session specific nodes, when the session of the client which created them is over, these nodes are automaticlly deleted.

Persistent_Sequential – This is a hybrid of persistant and sequental node type.A sequential Znode is given a sequence number by ZooKeeper as a part of its name. The value of a monotonicaly increasing counter is appended at the end of name.

Ephemeral_Sequential – This is a hybrid of Ephemeral and sequential nodetypes.

Watches in ZooKeeper : Is an inbuild callback system in ZooKeeper. Where a client can ask ZooKeeper to notify in case –

  • A znode is created or deleted.
  • Data on a znode is changed,
  • A child is added or deleted to a znode.

The read operations (exists, getChildren and getData) may set watches. The watches are triggered by write operations create, delete and setData.

  • Watch set on exists operation will be triggered when the znode being watched is created, deleted or updated.
  • Watch set on getData operation will be triggered when the watched znode’s data is updated or znode is deleted.
  • Watch set on getChildren operation will be triggered when a child of znode is created/deleted, or the znode itself is deleted.

To discover which child node has changed or what data on znode has changed, notified client should call getChildren or getData. In both these cases state of znode might have changed between receiving the watch event and performing read operation. Such cases should be handled by programmer himself.

A ZooKeeper watcher object can serve two purposes-

  1. It can be used to be notified of changes in ZooKeeper state (connecting, connected or closed). The default watcher passed into the ZooKeeper constructor object is used for state changes.
  2. It can be used to be notified of changes in znodes as discussed above. Znode changes may either use a dedicated instance of Watcher or It may use the default watcher if we use the form of read operation that takes a Boolean flag to specify whether to use a watcher.

Building Configuration Service using ZooKeeper

Now we will be discussing about how we can leverage the ZooKeeper’s configuration service to execute automated tests in a distributed environment. Below are some basic features which we will be implementing as a part of this service –

–          Machines will be registering themselves as test running machines.

–          There should be test scheduler service, through which user can schedule a test by providing the input about

  • what test to run,
  • on which browser to run
  • on which test machine to run.

–          Only one test can be scheduled on a test machine at a time.

–          User should be able to control (Pause, resume, stop) a running test on machine.

–          Negative scenarios should be safely handled e.g.

  • User should not be able to schedule test on a machine which is not online.
  • User should not be able to perform invalid state transitions e.g. from Start -> Resume.

High Level Design for POC


For the POC purpose we will be using just one node running the ZooKeeper service. This service will be running under the root node – /zoo

Application under Test (SearchEngineCrawler.java)

For simplicity’s sake, in our POC we will be running a Selenium webdriver test which will invoke a search engine website (Google/Bing) on the Chrome/Firefox browser, will search for “Three Pillar global” and count the number of search results.

Test Machine (ZKC_RegisterRunner.java)

Test machines will be ZooKeeper clients. Registration request will be a ZooKeeper connection request. Upon registration, the test machine will be creating a znode with the hostname of the machine under the root node as – /zoo/NDI-LAP-253. Data on this node will be null, till no test is scheduled.

Using ZooKeeper’s NodeDataChanged event type watch, test machines will be notified when any new test is scheduled on the or there is any state change e.g. from Start -> Pause.

Once the test is complete on test machine, client will automatically nullify the configuration data on corresponding ZooKeeper node.

Test Schedulers (ZKC_ScheduleTest.java)

Test schedulers will be ZooKeeper clients. Test scheduler will be able to schedule test by specifying three mandatory parameters – which online test machine to run, what test to run and what browser to run.

Internally test scheduler will update the configuration data in test machine’s node to – ‘testName|browser|START’. Test machine will in turn read this data through the ZooKeeper callback (watches) and invoke the test on asked browser. Till the time test machine node is having a non-null data, test scheduler can’t schedule a new test. Once the test on test machine is complete, test runner machine will remove the configuration data from its corresponding znode and hence Test schedulers can schedule further tests.

Test Controllers (ZKC_TestController.java)

Test controllers will also be ZooKeeper clients which will be able to pause, resume and stop an already running test on a test machine. Internally, the test controller will update the configuration data on a test machine znode. The test machine will receive the updated configuration data and accordingly stop, pause, or resume a test.

All the validations related to test machine name, whether the state transition is valid or not will be done at client (Test controller) level before updating the znode data. Once the data is updated, how to stop, pause and resume a test that implementation will be done in classes responsible for execution of actual test (in this case Webdriver test).

UML Diagram


Code Snippets

Connecting to ZooKeeper –



Creating a persistent node on ZooKeeper –


Erasing all nodes of zoo keeper using recursion –


Setting Watch


Steps to Run the Demo (on windows machine) –

We will be using three ZooKeeper client nodes.

  • Test scheduler/controller machine and
  • Test runner machine – 1.
  • Test runner machine – 2.

For demo, we will be using test scheduler/controller machine as ZooKeeper server too. So this machine will be working both as ZooKeeper host and client.

On all the three client machines edit the hosts file under C:\Windows\System32\drivers\etc directory, append this hostname/ IP Address mapping

IP Address ofTestSchedular machine ZooKeeperServerAddress

Run the ZooKeeper server on Test scheduler/ controller machine.

On Test runner machine 1

  • Navigate to bin folder of resources folder using command line.
  • Register this test runner machine by running following command –

java -cp .;..\lib\* ZKC_RegisterRunner


Perform the same step on test runner machine 2.

Navigate to Test Scheduler machine and schedule tests on test runner machine 1 –

java -cp .;..\lib\* ZKC_ScheduleTest NDI-LAP-253 firefox https://www.google.com


NDI-LAP-253 – is the ipaddress/hostname of test runner machine 1.

Firefox – is the browser on which to run the test.

URL – is the URL under test.


After the above step, test should be running on test runner machine 1.

We can pause, resume stop test running on test runner machine 1 by following command on Test controller (same as scheduler) machine.

java -cp .;..\lib\* ZKC_TestController NDI-LAP-253 RESUME