June 22, 2015

Agile Best Practices: SVP of Engineering Jeff Nielsen on the “Definition of Done”

Software development teams sometimes have a different meaning for the word “done” than do their customers. 3Pillar’s SVP of Engineering, Jeff Nielsen, explains in this video how a common understanding of the “Definition of Done” can help alleviate that confusion and lead to the avoidance of so-called iteration slop between development sprints.

Transcription:

Imagine this conversation between a team and its customer on the last morning of an iteration.

“Well hello there, Mr. Customer. You will be happy to know we finished this user story. It is completely done.”

“Well that is great news! It’s tested and everything?”

“Yep, we just finished up the testing a little bit ago.”

“So we can turn it on in production this afternoon?”

“Well it is not actually in production yet. It’s not quite even in staging; it needs to go through staging before we get to production.”

“So let’s push it to staging.”

“Well I guess we could do that. There is a little bit of user documentation that we ought to clean up that’s now potentially confusing because it’s changed. And there are a couple of defects that turned up in our testing- nothing big but things we probably ought to address. People won’t like using it with those defects- those bugs in there. Also my TL was telling me that there is now duplication between these three classes that we ought to clean up at some point. And there is a slight chance that the work we did on this story might adversely affect that feature we built three months ago. We won’t know that until we do a complete regression test, and…”

“So when you said that this was done, you meant…?”

Let’s stop right there. This is a classic example (and unfortunately one that I have lived and you may have, too) – of a team calling something “done” but having knowingly deferred a bunch of work that needs to happen for a user story to really be usable by somebody. And if you multiply that conversation times six or 10 user stories in the iteration, you end up with a phenomenon that I call colloquially “iteration slop,” where the work that ought to have been done within an iteration’s time box- within a bucket- slops out of that iteration and goes and gets all over other iterations.

And this is a problem for a couple of reasons. The obvious reason is any work that I’m going to do this iteration that has to do with last iteration’s stories is reducing my capacity for the current iteration. And that tends to be unpredictable- how much time we’re going to have to carve out from this iteration’s capacity to clean up from last iteration.

The second reason that iteration slop is problematic is even more insidious. That is that it reduces our flexibility. What we’re supposed to be able to do in every iteration boundary is pivot or change our minds or reprioritize based on the learning and our new understanding. And if I’m in a situation where I am at an iteration boundary but I still have stuff that is not quite done, I’ve either got to abandon that work that is in progress or I’ve got to wait until I can make the changes that I want to make. So you get into that situation and very quickly the agile value proposition becomes a lot less attractive to customers.

This is the problem that “definition of done” is intended to address. Imagine all the work that needs to be done for our user story to be truly production ready as a line going from here to here. This is everything from requirements, design, testing, coding, deployment, regression testing, performance testing, actually putting the thing into the right environment. And the “definition of done” tells us exactly how far along that line we intend to get before we call a story done. Now in an ideal-agile world, we would get to the end of the line for every single user story. As soon as the team called the story done, we would be able to turn it on in production. Now there are a variety of reasons that we don’t actually get there often in practice: things that we haven’t figured out how to do efficiently or cheaply enough to really do every story, such as a complete regression test. And sometimes we agree to do those things once per iteration or even once per release. And that’s workable as long as there is this explicit understanding between a team and a customer of what we mean when we say something is done. What is not OK is for a team to habitually or culturally or for whatever reason defer work that really could and should be done as part of the user story work- to defer that to a future iteration and still call things done.