Abstracting Selenium Tests using Page Object Model

Page Object Model is one of the most widely used design patterns by the Selenium Webdriver community across the world. In the initial days of functional automation, tools like Winrunner and QTP were the leading tools. These tools were more based on the procedural programming approach, as the languages they support like TSL and VB Scripting were not based on Object Oriented Programming (OOP).

Functional automation became more challenging with the advent of Web 2.0 as most of the older applications migrated from a typical desktop-based UI (.NET and Swing) to the web. This was the time when everybody was looking for a lightweight functional automation tool that should primarily focus on web-based UI and should be more powerful in terms of scripting. And then a tool called Selenium was born in 2004. As most of the developers who picked up Selenium had prior experience on tools like QTP, they applied the same design principles of procedural programming along with the keyword-driven or data-driven approach.

When Selenium 2.0 (WebDriver) was released, it was adapted very swiftly by the community and people have also started realizing that the old automation principles are not going to work with WebDriver. So the Selenium core team themselves came up with this new design pattern called Page Object Model (POM).

PageObjects introduces an abstraction layer within your Selenium tests and it provides a programmatic API to drive and interact with a UI. It makes automation easily readable and maintainable. Each page of your AUT (Application Under Test) is mapped to a class file in your code and each method within the class file can be treated as a service offered by the PageObject. As an example, think of Amazon.com's home page and this page offers the services like ability to search for a product, to navigate to a specific product category, etc.

To design a PageObject, first we need to understand that a page can typically be divided in functional or in structural manner. Let’s take the example of the Gmail login page.

Functional Implementation:

In this implementation we will split up the page on the basis of functionalities and will have the methods like LoginToGmailAsValidUser and LoginToGmailAsInvalidUser.

See the below code snippet:

public class GmailLoginPage {
  private final WebDriver driver;
//Page Object constructor which passes the driver context forward
  public LoginPage(WebDriver driver) {
      this.driver = driver;
  }
  By usernameloc = By.id("Email");
  By passwordloc = By.id("Passwd");
  By loginButtonloc = By.id("signIn");

  public HomePage LoginToGmailAsValidUser(String username, String password) {
	driver.findElement(usernameloc ).sendKeys(username);
driver.findElement(passwordloc ).sendKeys(password);
driver.findElement(loginButtonloc   ).click();
	return new InboxPage(driver) 
  }
  public GmailLoginPage LoginToGmailAsInvalidUser(String username, String password) {
	driver.findElement(usernameloc ).sendKeys(username);
driver.findElement(passwordloc ).sendKeys(password);
driver.findElement(loginButtonloc   ).click();
	return this;
  }
}

The benefit of this approach is that the page structure is completely abstracted from the test layer. For example an extra checkbox “stay signed in” has been added on the login page and it also needs to be selected while logging a user in. So we simply need to add the extra code to handle this checkbox in the same method of the page class and it will not have any impact on the test layer, as the tests will still call the same method to login.

Structural Implementation:

In this approach the page is divided structurally depending upon the number of elements on the page which we need to interact with. See the below code snippet:

public class GmailLoginPage {
  private final WebDriver driver;
//Page Object constructor which passes the driver context forward
  public LoginPage(WebDriver driver) {
      this.driver = driver;
  }
  By usernameloc = By.id("Email");
  By passwordloc = By.id("Passwd");
  By loginButtonloc = By.id("signIn");

  public HomePage typeUsername(String username) {
	driver.findElement(usernameloc).sendKeys(username);
	return this;
  }
public HomePage typePassword(String password) {
driver.findElement(passwordloc ).sendKeys(password);
	return this;
  }
public HomePage clickOnSignin(String password) {
driver.findElement(loginButtonloc   ).click();
	return new InboxPage(driver) 
  }
}

This approach gives more flexibility as it exposes all the elements of the page and then can be leveraged based on the requirement in the test layer. But the major flaw in this approach is that if any extra fields are added on the screen like the above example where a new checkbox is added on the login screen. Then we would need to add an extra method in the page class to handle this checkbox and also we would need to update all our test cases that include log-in functionality. So it makes it really cumbersome to update the code at multiple places for a single change.

Challenges in Implementing Page Object Model:

Page object model is a very effective design pattern provided it is implemented correctly. I would like to cover some complex scenarios.

  1. Whenever any pageObject service (method) results in to a new page navigation, then that new page should be returned by the method. Let’s take the above example of Gmail login and as we know the method LoginToGmail() will lead us to the Inbox page so this should have a return type of InboxPage. See the below code:
 public HomePage LoginToGmail(String username, String password) {
	driver.findElement(usernameloc).sendKeys(username);
driver.findElement(passwordloc ).sendKeys(password);
driver.findElement(loginButtonloc   ).click();
	return new InboxPage(driver) 
  }

So when we call this method in our test it will look like:

GmailLoginPage loginPage = new GmailLoginPage (driver);
InboxPage = loginPage.LoginToGmail(“username”,”password”);

You can see, we just created the object of the starting page (GmailLoginPage) of our application and then it will work like a chain reaction. All the following page objects (like the Inbox Page) would be returned automatically by the corresponding page service which is causing that navigation.

  1. As we have seen the above scenario which is quite simple to implement it in most of the scenarios but there would be some complex scenarios which would be really tricky. Let’s take the same example of Gmail login where we simply created a Page method to login “LoginToGmail” which takes the username and password as the argument and return the next Inbox page.

This was a happy scenario, but if we need to test the negative scenario where we are testing that if we pass the invalid credentials then it will keep you on the same login page and will display the “invalid credentials” error. How we will incorporate this behavior in the same method as it always returns the object of the next page.

This problem can easily be handled in the languages like Ruby where the same class method has multiple return types. We will add an extra argument in the method which will tell us whether we are passing the valid or invalid credentials and on the basis of this value we will change the method return type.

This is how we will tackle this problem in Ruby.

 def LoginToGmail(username, password, usertype)
     @driver.find_element(username).send_keys(uname)
     @driver.find_element(password).send_keys(password)
     @driver.find_element(SUBMIT).submit
     if usertype== ‘valid’
         return InboxPage.new (@driver)
     else'
     return GmailLoginPage.new (@driver)
     end
 end

If we have to achieve the same thing in languages like Java and C# then we have no other choice other than splitting this method into two separate methods, as these languages do not support different return types for the same class method.  This is how we will do this in Java:

 public HomePage LoginToGmailAsValidUser(String username, String password) {
	driver.findElement(usernameloc ).sendKeys(username);
driver.findElement(passwordloc  ).sendKeys(password);
driver.findElement(loginButtonloc   ).click();
	return new InboxPage(driver) 
  }
  public GmailLoginPage LoginToGmailAsInvalidUser(String username, String password) {
	driver.findElement(usernameloc).sendKeys(username);
driver.findElement(passwordloc ).sendKeys(password);
driver.findElement(loginButtonloc   ).click();
	return this;
  }

As we can see the first method will only take the valid credentials and return the Inbox page object. The second method will only take the invalid credentials and will return the same GmailLoginPage object.

  1. In the third and the last scenario, I will cover the POM implementation for the pages which have overlapping functionalities. Let’s take the example of Gmail, after logging in we see the search functionality at the top of the Inbox page which allows us to search things like mail threads, contacts, etc. This functionality is available on most of the Gmail pages like the Inbox, Compose, and Settings page. So we can say search is a functionality which is common to many pages. Now the problem is deciding in which page we will place the code of this search functionality.

Some people place this code in the common utils so that it can be called from any page, but this is not the right approach as it is a deviation from page object model. So the best approach to handle this is to create a SearchPage class which should be abstract as we need not to instantiate separately. All the pages which have this search functionality will extend this SearchPage class so that they can use this search code internally. Here we have applied OOPs (Object Oriented Programming) concepts to reduce the duplicate code.

In this blog, I have introduced the Page Object Model and some complex scenarios around it. In the next series of this blog I will try to cover some more advanced concepts like PageFactory, LoadableComponent and @CacheLookup.