May 6, 2016

Statistical Analysis of an Agile Project

At 3Pillar, we execute projects in a collaborative manner to deliver working software in short iterations. Typically the complete process involves a product roadmap, release planning, iteration planning, continuous integration, iteration review, retrospection, and release retrospective.

The key to delivering products to market in a quick timeframe is to focus on work in hand and deliver in small iterations called sprints. The outcome of the team effort in a sprint is measured in terms of “team velocity.” One way to measure the velocity is to add up the story points associated with each story committed in the sprint. Story points can be based on the relative complexity and the size of the stories (affinity estimation technique). For a stable team and sprint duration, this process should result in a consistent and predictable sprint velocity.

One day while discussing on research ideas, we wondered if there was a way to determine the extent to which this process was meeting our expectation of consistent delivery. It would be great to know if our projects following the agile development process were running in a predictable fashion or not!

With some further research, we identified that the industry uses control charts and the process capability index (e.g. CPK or CPM) as a process performance index to analyze the effectiveness of their process. This was great, because we now had a way to move forward.

Let us understand it a bit more. A process involves tools, materials, methods, and people engaged in producing a measurable output. Consider an assembly line in a manufacturing unit producing a circular shaped product X. Product X is considered good if its output parameters say its diameter is within a specification. The diameter value of the product produced out of an assembly line can be monitored for possible variability. This will help to gauge whether the assembly line producing product X is in statistical control or not. The ability of a process to meet specifications can be accessed via control charts.

Coming back to our agile process, we assume that the delivered velocity is a good measure to determine if any project is stable within a defined specification. We aimed to check the process capability of a project via:

  • Measuring the variability of sprint’s delivered velocity.
  • Drawing control charts to check if a project output is in statistical controlled limit.

Research

We did univariate analysis, with the observed/independent variable as delivered velocity in our case. We needed a way to generate delivered velocity data for our study. We simulated the generation of project delivered velocity data points using Bernoulli trials based on the following logic:

  • We tossed 2 coins.
  • If the outcome was two heads (HH), we generated a new sprint with a delivered velocity that is outside the range of (Mean-Standard Deviation).
  • If the outcome was either heads/tails or two tails (HT, TH, TT), we generated a new sprint with delivered velocity inside the range of (Mean-Standard Deviation)

Coin Toss

Next, we did exploratory data analysis by plotting a scatter plot diagram of the delivered velocity of each sprint for a project. Most of the points are scattered along the mean line.

control_chart

We drew the distribution of the delivered velocity points and calculated the descriptive statistics of mean, median, mode, quartiles, standard deviation, variance, coefficient of skewness, and coefficient of kurtosis.

Normal Distrbution

Observations

1. For projects with a sufficient duration (>10), the distribution is normal and the control charts suggest that the process is statistically in control if there is not a high variability.

2. The generated control charts show that most of the observed values are falling within (Mean +/- 1 * Standard Deviation) level. We were deliberately aggressive with our warning range because the industry trend of taking Mean +/ -2 Standard Deviations would be too lax for our agile methodology.

3. Projects with a -ve high value of skewness coefficient and a low value for the kurtosis coefficient suggested some issues with velocity predictability. This indicates the delivered velocity is frequently getting lower than mean value.

4. Projects with several points (sprint velocities) lying outside of the set variation window show a higher variation in the control charts, and thus will have a flatter probability distribution curve.

5. Projects with a high variation have a zig-zag scatter plot in the control chart.

By closely monitoring the individual project’s “control charts” and other “descriptive statistics,” we can find those projects that are running stable and those that are shouting for process inefficiencies.

If you have any queries, leave a comment below. I will highly appreciate your feedback!