Abstracting Selenium Tests using Page Object Model

Page Object Model is one of the most widely used design patterns by the Selenium Webdriver community across the world. In the initial days of functional automation, tools like Winrunner and QTP were the leading tools. These tools were more based on the procedural programming approach, as the languages they support like TSL and VB Scripting were not based on Object Oriented Programming (OOP).

Functional automation became more challenging with the advent of Web 2.0 as most of the older applications migrated from a typical desktop-based UI (.NET and Swing) to the web. This was the time when everybody was looking for a lightweight functional automation tool that should primarily focus on web-based UI and should be more powerful in terms of scripting. And then a tool called Selenium was born in 2004. As most of the developers who picked up Selenium had prior experience on tools like QTP, they applied the same design principles of procedural programming along with the keyword-driven or data-driven approach.

When Selenium 2.0 (WebDriver) was released, it was adapted very swiftly by the community and people have also started realizing that the old automation principles are not going to work with WebDriver. So the Selenium core team themselves came up with this new design pattern called Page Object Model (POM).

PageObjects introduces an abstraction layer within your Selenium tests and it provides a programmatic API to drive and interact with a UI. It makes automation easily readable and maintainable. Each page of your AUT (Application Under Test) is mapped to a class file in your code and each method within the class file can be treated as a service offered by the PageObject. As an example, think of Amazon.com’s home page and this page offers the services like ability to search for a product, to navigate to a specific product category, etc.

To design a PageObject, first we need to understand that a page can typically be divided in functional or in structural manner. Let’s take the example of the Gmail login page.

Functional Implementation:

In this implementation we will split up the page on the basis of functionalities and will have the methods like LoginToGmailAsValidUser and LoginToGmailAsInvalidUser.

See the below code snippet:

public class GmailLoginPage {
  private final WebDriver driver;
//Page Object constructor which passes the driver context forward
  public LoginPage(WebDriver driver) {
      this.driver = driver;
  }
  By usernameloc = By.id("Email");
  By passwordloc = By.id("Passwd");
  By loginButtonloc = By.id("signIn");

  public HomePage LoginToGmailAsValidUser(String username, String password) {
	driver.findElement(usernameloc ).sendKeys(username);
driver.findElement(passwordloc ).sendKeys(password);
driver.findElement(loginButtonloc   ).click();
	return new InboxPage(driver) 
  }
  public GmailLoginPage LoginToGmailAsInvalidUser(String username, String password) {
	driver.findElement(usernameloc ).sendKeys(username);
driver.findElement(passwordloc ).sendKeys(password);
driver.findElement(loginButtonloc   ).click();
	return this;
  }
}

The benefit of this approach is that the page structure is completely abstracted from the test layer. For example an extra checkbox “stay signed in” has been added on the login page and it also needs to be selected while logging a user in. So we simply need to add the extra code to handle this checkbox in the same method of the page class and it will not have any impact on the test layer, as the tests will still call the same method to login.

Structural Implementation:

In this approach the page is divided structurally depending upon the number of elements on the page which we need to interact with. See the below code snippet:

public class GmailLoginPage {
  private final WebDriver driver;
//Page Object constructor which passes the driver context forward
  public LoginPage(WebDriver driver) {
      this.driver = driver;
  }
  By usernameloc = By.id("Email");
  By passwordloc = By.id("Passwd");
  By loginButtonloc = By.id("signIn");

  public HomePage typeUsername(String username) {
	driver.findElement(usernameloc).sendKeys(username);
	return this;
  }
public HomePage typePassword(String password) {
driver.findElement(passwordloc ).sendKeys(password);
	return this;
  }
public HomePage clickOnSignin(String password) {
driver.findElement(loginButtonloc   ).click();
	return new InboxPage(driver) 
  }
}

This approach gives more flexibility as it exposes all the elements of the page and then can be leveraged based on the requirement in the test layer. But the major flaw in this approach is that if any extra fields are added on the screen like the above example where a new checkbox is added on the login screen. Then we would need to add an extra method in the page class to handle this checkbox and also we would need to update all our test cases that include log-in functionality. So it makes it really cumbersome to update the code at multiple places for a single change.

Challenges in Implementing Page Object Model:

Page object model is a very effective design pattern provided it is implemented correctly. I would like to cover some complex scenarios.

  1. Whenever any pageObject service (method) results in to a new page navigation, then that new page should be returned by the method. Let’s take the above example of Gmail login and as we know the method LoginToGmail() will lead us to the Inbox page so this should have a return type of InboxPage. See the below code:
 public HomePage LoginToGmail(String username, String password) {
	driver.findElement(usernameloc).sendKeys(username);
driver.findElement(passwordloc ).sendKeys(password);
driver.findElement(loginButtonloc   ).click();
	return new InboxPage(driver) 
  }

So when we call this method in our test it will look like:

GmailLoginPage loginPage = new GmailLoginPage (driver);
InboxPage = loginPage.LoginToGmail(“username”,”password”);

You can see, we just created the object of the starting page (GmailLoginPage) of our application and then it will work like a chain reaction. All the following page objects (like the Inbox Page) would be returned automatically by the corresponding page service which is causing that navigation.

  1. As we have seen the above scenario which is quite simple to implement it in most of the scenarios but there would be some complex scenarios which would be really tricky. Let’s take the same example of Gmail login where we simply created a Page method to login “LoginToGmail” which takes the username and password as the argument and return the next Inbox page.

This was a happy scenario, but if we need to test the negative scenario where we are testing that if we pass the invalid credentials then it will keep you on the same login page and will display the “invalid credentials” error. How we will incorporate this behavior in the same method as it always returns the object of the next page.

This problem can easily be handled in the languages like Ruby where the same class method has multiple return types. We will add an extra argument in the method which will tell us whether we are passing the valid or invalid credentials and on the basis of this value we will change the method return type.

This is how we will tackle this problem in Ruby.

 def LoginToGmail(username, password, usertype)
     @driver.find_element(username).send_keys(uname)
     @driver.find_element(password).send_keys(password)
     @driver.find_element(SUBMIT).submit
     if usertype== ‘valid’
         return InboxPage.new (@driver)
     else'
     return GmailLoginPage.new (@driver)
     end
 end

If we have to achieve the same thing in languages like Java and C# then we have no other choice other than splitting this method into two separate methods, as these languages do not support different return types for the same class method.  This is how we will do this in Java:

 public HomePage LoginToGmailAsValidUser(String username, String password) {
	driver.findElement(usernameloc ).sendKeys(username);
driver.findElement(passwordloc  ).sendKeys(password);
driver.findElement(loginButtonloc   ).click();
	return new InboxPage(driver) 
  }
  public GmailLoginPage LoginToGmailAsInvalidUser(String username, String password) {
	driver.findElement(usernameloc).sendKeys(username);
driver.findElement(passwordloc ).sendKeys(password);
driver.findElement(loginButtonloc   ).click();
	return this;
  }

As we can see the first method will only take the valid credentials and return the Inbox page object. The second method will only take the invalid credentials and will return the same GmailLoginPage object.

  1. In the third and the last scenario, I will cover the POM implementation for the pages which have overlapping functionalities. Let’s take the example of Gmail, after logging in we see the search functionality at the top of the Inbox page which allows us to search things like mail threads, contacts, etc. This functionality is available on most of the Gmail pages like the Inbox, Compose, and Settings page. So we can say search is a functionality which is common to many pages. Now the problem is deciding in which page we will place the code of this search functionality.

Some people place this code in the common utils so that it can be called from any page, but this is not the right approach as it is a deviation from page object model. So the best approach to handle this is to create a SearchPage class which should be abstract as we need not to instantiate separately. All the pages which have this search functionality will extend this SearchPage class so that they can use this search code internally. Here we have applied OOPs (Object Oriented Programming) concepts to reduce the duplicate code.

In this blog, I have introduced the Page Object Model and some complex scenarios around it. In the next series of this blog I will try to cover some more advanced concepts like PageFactory, LoadableComponent and @CacheLookup.

 

Ruchi Bajpai

Ruchi Bajpai

Quality Assurance Lead

Ruchi Bajpai is working as a Quality Assurance Lead at 3Pillar Global. She has over 9 years of experience in Quality Assurance, Software Testing and Application Development. Her primary areas of interest are Functional Automation, Security Testing and WebServices Testing. Ruchi is a graduate from Harcourt Butler Technological Institute (HBTI) in Kanpur, India. Prior to joining 3Pillar, she had worked for companies such as Tech Mahindra and Computer Sciences Corporation (CSC).

36 Responses to “Abstracting Selenium Tests using Page Object Model”
  1. Ashutosh Rawat on

    As you declared username and password like

    By usernameloc = By.id(“Email”);
    By passwordloc = By.id(“Passwd”);

    In this case code should be
    driver.findElement(usernameloc ).sendKeys(username);
    driver.findElement(passwordloc ).sendKeys(password);

    Reply
  2. Jaypal Singh on

    Thanks Ruchi for an informative post. I would like to suggest that for languages like Java, instead of having two methods (valid and invalid login) we can return a common Navigation Object. The error checking can be done in Navigation Object and every new page should get the driver from Navigation Object.

    Reply
    • Ruchi Bajpai on

      This could be a good approach but it has its own challenges. If we will return a generic object which works as a base page object for all the pages then every time we would need to typecast it before using the page specific methods. This can be done when we know which page is expected in return but suppose you pass correct credentials but still login failed then you will try to typecast it to homepage object and it will fail. So here we need to do proper try catch and also need to figure out which code code we will place in the base page constructor to know that opened page is correct or not. But still your approach is valid and can be used 🙂

      Reply
  3. Nitin on

    Hi,
    Its a good blog.

    I have a query, You have mentioned that if we have some common functionality like search then we can create SearchPage as abstract and required class can extend it. But in case we have more then one common functionality like SearchPage and MenuBar (both have different behavior so we can not merge it in one class like in SearchPage ) and we have to test both cases (Search and MenuBar) on any other page. In this case, we can not extends both classes. Now what to do?

    But if we have one CommonUtil class and defined all the common functionality in this class then this way we can resolved this scenario.

    What do you think? Please post your views on this…

    Thanks,
    Nitin

    Reply
    • Ruchi Bajpai on

      Most of the programming languages doesn’t allow multiple inheritance so here you can use Interfaces.

      Another option is to have the code in separate classes but have one base page class, menubar class will inherit from it and search class will inherit from menubar. So this way your code is also distributed and will follow the oops standards. But yes the implementation is not 100% correct. So here we have to make a choice but still going with a util calls would be the last option I would go for.

      Reply
      • Nithya on

        Hi Ruchi,

        You have mentioned that we can implement multiple interfaces in this case but again you have to define the abtsract methods in the implementing class.SO in this case we are recoding the same code again which is redundant which violates the Page Object principle.

        Thanks

        Reply
  4. Pranay Roy on

    It’s really a beautiful article. I’ve cracked so many interviews by mentioning this framework. But as per my knowledge, we can have multiple conditional return statements in Java. In the above example we can put the return type of the method as Object (as generic type), creating a variable of Object type, initializing the variable in the conditional return statements and then returning the variable.

    Reply
    • Ruchi Bajpai on

      I have answered the cons of this approach in the above comment. It is the only option for Java people but it needs to be implemented smartly using polymorphism.

      Reply
  5. Roma on

    Nice article!
    Very interesting to look at your approach and read your views on PageFactory and LoadableComponent!

    Reply
  6. Roman on

    Nice post! very interesting…
    Would be nice to look also at your ideas as to PageFactory and LoadableComponent.
    Keep going!

    Reply
  7. Anitha on

    Hi Ruchi..Its really Amazing. I was completely fed up of searching and understanding about POM in selenium..But after visiting your website, I clearly understood the concept of POM especially that two type of POM approaches(Functional and Structural implementation). Now I could understand both approaches and its differences. Thanks a lot for your work. Please keep go ahead..Keep writing for us..

    Reply
  8. Jeff on

    Hi Ruchi, nice post. Agree that passing Page objects from methods is a cleaner approach. I ahve one question though. Your code

    InboxPage = loginPage.LoginToGmail(“username”,”password”);

    works fine for the happy path when valid details are entered into the login page.

    However, when invalid details are entered you will stay on the same page! As you returning “this” from your method you are assigning InboxPage to the LoginPage which doesnt make any sense.

    Reply
    • capital on

      I think that is the reason she goes in tackling such scenario in ruby which have multiple return types for same class methods

      Reply
    • Ruchi Bajpai on

      Yes this is the limitation of Java so here we can place some code in the constructor of the page which will try to validate if we are on correct page or not using title or asserting on an element. In case we are on wrong page then we can throw an exception, something like illegalstate exception. Now coming to your question, when we will try to run the following code in case of wrong credentials.

      return new InboxPage(driver)

      It will call the constructor of the page and will throw the exception if we are still on login page so your test case would also fail.

      In my opinion this is the best way to handle this.

      Reply
  9. Muthu on

    Thanks for the useful article on Page Object Model.

    Reply
  10. akash gupta on

    hello ruchi i have some doubt regarding pom please clear
    suppose there is a 1 web page in this i have to write the script for 100 test cases and there are lot of different scenarios in single web page so what should i have to do whether i have to create separate class for each scenario or i write whole 100 test cases in single class

    Reply
  11. Virender Singh on

    Hey Ruchi,

    Nicely written article and very informative. I have one question, when you say Search should be an abstract class you don’t mean that it is a pure abstract class? Search abstract class should have all the functionality implemented. Its just that relevant classes will inherit from it ?

    Reply
  12. kari on

    I have a question. How would you handle a scenario where your page contains different elements based upon a different object. IE I have directTV or cableTV product page that displays different information and buttons based upon if the show is a tv series vs. a movie. The 1st will have an option to display all the episodes and a button to play an individual episode. It will have a button to go to the info page similar to the movie which which includes a button to play the show. Depending upon other criteria other buttons may or may not display.

    Reply
  13. Sushmits on

    Can you pls explain How can we use List and set in our tests riots in Page o me t Model.

    Reply
  14. Sushmita on

    Can you please explain how can we use the concept of List and set in Automation Test scripts using page object Model.

    Reply
  15. Rajesh on

    Very nice article. Thanks for sharing your knowledge

    Reply
  16. Chandan Gupta on

    Thanks a lot. This is what i was finding.

    Reply
  17. Ramesh on

    Hi,

    In case there are two types returns, how could this be handled in calling class.
    i.e. it has to be either of the oage type only.

    Thanks for the article

    Reply
  18. Ravikumar Venkatesan on

    Very Nice article

    Reply
  19. Raghu Singh on

    What is the most popular & intuitive page object framework for Java & Selenium ? I am looking for something similar to the Taza framework in Ruby.

    Taza Summary: https://github.com/scudco/taza/wiki

    Reply
  20. Mallikarjun S on

    Thanks for the sharing the article, Very informative

    Reply
  21. sumant on

    hiii…Ruchi

    can you please explain about below code that you have mention in above.

    return new InboxPage(driver)

    Reply
  22. Vivek on

    Hi Ruchi ..

    how can we write POM for scroll and reading data on excel ?

    Reply
  23. AZHARUDDIN KHAN on

    Can we use POM with data driven framework?

    Reply
  24. skay on

    in the class GmailLoginPage why is the constructor name not the same as class name. Why is the constructor name LoginPage? Will this work when you try to instantiate GmailLoginPage loginPage = new GmailLoginPage (driver); ?

    Reply
  25. Amit Kumar on

    Hi Ruchi,
    Thanks for providing the details of POM framework. It is very helpfull for all. I want training on framework part , can you please let me know if you c an provide training.
    Please reach me on mitu81sweet@gmail.com or skype me amitkumar_81

    Thanks

    Reply
  26. Umakanth on

    Hi Ruchi,

    Thanks for the information. Could you post the link for your next blog on page factory

    Reply
  27. QA on

    Hi Ruchi,

    If I am not sure whether my current Page will go to which Page, What I need to do.

    My application has diff layers.

    If everything it is a new user , it will navigate to Page 1 otherwise it will navigate to Page 2/ Page3. There is no control at all.

    Can u pls suggest me some good solution.

    Thanks,
    Automation QA

    Reply
Leave a Reply