The Shapes of Teams and Backlogs

8 min readSep 15, 2023

Let us say our organization has three products or specialties. There are 10 features that need to get done across these three specialities. We have about 30 developers. How should we organize our teams — Three teams that specialize in one product each? Three T-Shaped teams? One large team?

During a very interesting conversation in the ProKanban slack channel (which you all should consider joining), Steve Tendon expressed his preference for specialized teams. To quote Steve “Better to have multiple, highly specialized, small teams — and coordinate them properly — than a large bunch of T-Shape “jacks-of-all-trade.”. As someone who believes that we should all be able to help with everything, this caused some cognitive dissonance. We need to prove or disprove this theory somehow.

The obvious way to do this is to find two sets of teams. A set of specialized teams and another set of generalized(T-Shaped) teams that would work on the exact same set of features and observe the resulting performance. Further, nothing else can be different between the two setups. Both sets have to have the same capabilities, same people, same cultural norms, and the exact same holiday/vacation/sickness schedule. There are too many variables that can interfere with our results. In other words, this is an almost impossible experiment to set up in real life.

We need a way to run these two separate scenarios multiple times without changing anything other than the organization set up. This is where we turn to our old friend Monte Carlo. We can set the conditions, change one variable at a time, and observe how much of a difference that variable makes. We can run these simulations thousands of times and get an idea of how likely are we to be successful with the two sets of teams.

Setting Up The Simulations

The basic elements we need for Monte Carlo are some past data to use for simulations and an understanding of the future conditions we are simulating. For the first part, since these are fictional teams, we will use a static set of past throughput that was randomly generated. This will serve as the past 20 days of throughput from which we will sample in order to simulate the future. All teams in our simulation will use this same throughput.

The second requirement is the future conditions to simulate. We will simulate 30 days into the future. The organization we are running these simulations for knows the next 10 features in priority order. We are going to see which of these features are likely to get done if we are using specialized teams and which are likely to get done if we use generalized teams. Below are the randomly generated features that we will be using for simulations.

The features above are listed in priority order. F1 is a higher priority than F2, which is a higher priority than F3, and so on. The middle column shows the specialty or product of each feature. The last column is the size of the feature in stories. Those familiar with Monte Carlo Simulations using throughput will recognize this as a key component in determining when a feature will get done. The specialties and story counts for the features were randomly generated. The story counts range between 1 and 20 stories.

Scenarios

We are going to try to run Monte Carlo simulations assuming a few different scenarios. These scenarios are variations of team setups and prioritization setups. Below are the four scenarios we will be running simulations for —

Three Generalized Teams, Central Prioritization— These are three teams that are T-shaped. They are not specialists in any of the products. They work off of a Central Priority which means when they get done with a feature, they pull the next highest-priority feature regardless of speciality.
Three Specialized Teams, Specialized Prioritization— These are specialized teams. They will get a 20% boost in throughput in our simulations. That will mean that they will run through features faster than the generalized teams. These teams work off of specialized team backlogs. They pull the highest priority feature in their specialty, not in overall priority order.
Three Specialized Teams, Central Prioritization— Similar to the previous setup, these are specialized teams. The difference now though, is they pull from a unified backlog in overall priority order. They do not just pull their own specialty, but when they do work on their own specialty, they get a 20% bonus in throughput.
One Large Team, Central Prioritization— In this case, the entire org acts as a single team. They pick up work in overall priority order. They work on 3 features at a time. If the three features are of different types, they are able to assign the specialists to them and get a 20% throughput boost. In case two features of the same type are in progress, they get a 10% boost. There is no boost when all three features are the same type.

Another slight tweak, if a team(except in the One Large Team case) completes two features that they are not specialists in, they gain specialty in that area and get a boost the next time they work on that type of feature. This accounts for people learning new areas as they get work done.

Results

So what do our Monte Carlo results tell us? First let us remind ourselves of the features we were trying to get done.

The table below shows these results. We can see the probability of each feature getting done in the scenarios listed above.

These are some curious results. In general, the most common scenario you see at organizations — Specialized Teams with Specialized Backlogs (Scenario 2) seems to have the worst performance. Generalized Teams (Scenario 1) do better. But neither of these seems to perform as well as Scenarios 3 and 4. There is very little to choose between the last two scenarios. Scenario 4 gives greater confidence for lower priority features, but priority number three seems to have a greater chance of completion in Scenario 3. The results point towards either having a single large team or having smaller specialized teams that pull in overall priority order, not just using their own specialties.

Another Set of Features

The feature set we used in the previous simulations is just one of many possibilities. It might have biased our results. What if we used a different set of features and ran Monte Carlo? Here is another randomly generated feature set -

We use the same rules as before and run Monte Carlo simulations for the four scenarios. Now, our risk profile for the scenarios is shown in the table below.

The results follow a similar pattern as before. Specialized Teams with Centralized Prioritization (Scenario 3) gives us the best chance of finishing the top two features. Large Team (Scenario 4) is the best option for the top three features. Scenarios 3 and 4 seem to once again outperform Scenarios 1 and 2. The Specialized Teams with Specialized Backlogs (Scenario 2) has some very positive results for priorities five and six. This comes at the cost of priorities three and four.

Taking it to Eleven

We can see that the setup of the feature set has some effect on the overall results. This is where our amps can go to 11. We can randomly generate a new feature set every time we run the simulation. Each scenario works on the same feature set and we record the result. Then we generate a completely new feature set and run every scenario. We do a Monte Carlo of Monte Carlo.

In doing this, we tend to lessen the impact of any particular feature set and get a much more generalized picture. This ill give us a very decent idea of which scenario is the best way to set our organization up, regardless of the type of features that show up. The results of these simulations are shown below.

Monte Carlo Results for All Feature Sets

In general Scenario 4 performs the best. Combining the three smaller teams into one large team provides the most robust set of results. This is not proof, but confirmation, that the common small-team advice in Agile is mostly dogma. Scenario 3 is pretty robust as well. It seems Specialized teams that pull in overall (not team) priority order perform pretty well. It is important to remember that in the first three scenarios, the teams are working on one feature at a time, so there is no context switching, except after a feature is done. Generalized Teams with Unified Bacnklog do decently as well. The lack of the 20% speed bonus definitely shows.

With all other things being equal, it does seem that the most commonly used pattern is the worst performer. Especially if we are concerned about having a greater degree of confidence past our top 2 priorities. Specialized teams that have their own backlog and do not pay attention to the overall priority seem to lower the chance of getting the higher-priority features done. This is despite giving these teams a 20% throughput advantage. The speed increase optimizes the subsystem but sub-optimizes the whole.

Caveats and Conclusions

This simulation is simplistic and assumes all other things are equal. That is the only way to compare changing one variable at a time. Sure there might be other factors that can have an effect on performance. Systems not well equipped to handle large teams will have a hard time with scenario 4. Product Owners and managers, wanting to optimize for speed in the short term will always push for scenario 2 regardless of the evidence.

This simulation removes all these variables to skinny the discussion down. We also haven't simulated the impact of absences due to people being on leave. Those will have a greater impact on smaller teams. We have also ignored dependencies. All these can be added in. The point here is to do a basic comparison of systems.

We also should not look at the feature sets as a pre-set backlog. These could be just-in-time pulled features that are right-sized without knowing the exact story count in advance. These are just the next 10 features that show up. They are listed for context. Monte Carl of Monte Carlo is hard enough to explain without giving context about the data we are simulating against.

Overall though, it seems there are some pieces of traditional Agile guidance that are undermined by the results of the simulation.

Small teams are not better than large teams
Having centralized organization-wide prioritization is vastly superior to team-specific prioritization
The most commonly seen case — Small Specialized Teams with Team Specific Backlogs is the worst-performing case.
Having generalized T-shaped teams does not make things that much better.