Neville Kuyt’s blog.

Agents and demons.

2025-03-20T15:44:43+00:00

I’ve been exploring smolagents for a few weeks now. As a “words” person, I prefer that to the visual tools like N8N.

Firstly - it’s frustrating! The documentation is out of date, the notebooks don’t work, models get outdated. I guess that’s a side effect of how fast AI is moving right now - but it also shows that using this stuff in production is high risk! Breaking changes seem the rule, not the exception.

But when you get over that hurdle, it’s fascinating. Tools like Manus show a glimpse of the future - and smolagents shows how you might build such systems. In a team building workshop ages ago, we played the https://leadershipinspirations.com/wp-content/uploads/2018/07/PBJ-Challenge.pdf. In the game, one person must write out, step by step, how to make a sandwich; the other person must follow those steps - but had to pretend to be a robot, with no knowledge of the world (there’s a variant where the other person is blindfolded). The point of this is to build communication - don’t assume the other person has your knowledge, or see things the same way.

I think recent iterations of LLMs were a bit like that game, when played by two players without a strong shared understanding of the world. There was no end of example of simple prompts that would send the LLM a bit loopy. Basic maths, counting the number of occurrences of a letter in a word - they all felt like you’d asked someone to make you a sandwich who’d only a vague understanding of bread.

Newer LLMs are much better - presumably because they’ve had more and better training data, the algorithms have got better at extracting understanding from that data, and the humans building the LLMs have learned from the many hilarious mistakes posted on Twitter. We’re also seeing an increased specialization in LLMs - turns out that an LLM that writes code doesn’t need to know much about skateboards.

What we’re seeing with agentic architectures is multiple components working together, in unscripted ways, to solve problems. This is a rapidly evolving field, but I am exploring the following concepts, based on smolagents, and using the sandwich game to illustrate the concepts.

Planning agent

An agent that can come up with a high level plan to solve the problem. Usually an LLM. Given the “make a sandwich” request, it might come up with the following plan:

Instruction: Make a sandwich

Plan

Get two slices of bread
Butter the bread
Put slice of ham on one of the slices of bread
Put some sliced tomato on the ham
Put some pickle around the tomato
Put the other slice of bread on the tomato and pickle

This planning agent operates at a fairly high level, and only has to come up with a plan. But one key aspect is that it can learn from its failures.

If you executed the plan as is, it would fail - it can’t find the ingredients, it doesn’t know how to use a knife, doesn’t know how to move things around or understand the concept of “around the tomato”.

So then it would come up with other strategies - it might come up with a more detailed plan, or change the sequencing, or change the ingredients. It keeps trying until it makes a sandwich (or, at least, thinks it has!), or until it’s told to stop.

Tools

There are many, many tasks that we already know how to execute, for which we can just provide an implementation for the planning agent to use. This is much faster and cheaper (both in terms of resource usage and money) than have the LLM figure it out. For instance, if you want to add two numbers together, it’s much easier to define an “addition tool” than have an LLM figure it out from scratch. In the “sandwich” example, we might refine it as follows.

Instruction: Make a sandwich. You have access to a tool that can find things in your kitchen, and a knife tool

Plan

Get two slices of bread
- Find two slices of bread
- Find plate
Butter the bread
- Find butter
- Use knife to spread butter
Put slice of ham on one of the slices of bread
- Find ham
Put some sliced tomato on the ham
- Find tomato
- Use knife to slice tomato
Put some pickle around the tomato
- Find pickle
- Use knife to spread pickle
Put the other slice of bread on the tomato and pickle

The agent would try to execute these steps - but you’d still not get a sandwich. Knowing where the bread is doesn’t magically move it to the worktop.

Code-writing agents

Some tasks depend heavily on context or situation, or arise as the planning agent develops it’s plan. Don’t forget, the instruction we’re giving the planning agent is “make a sandwich”; we may have an instinct as to what that requires, but the point of the agent is to figure it out for us. If we knew what the process is, it would be cheaper and faster (in execution time, at least) to just write the software directly in a traditional way.

So, a code writing agent is a component that can take in a requirement, and write code to fulfill that requirement, then execute that code.

So, assuming we have a robot in the kitchen, and we are programming it to make a sandwich:

Instruction: Make a sandwich. You have access to a tool that can find things in your kitchen, and a knife tool. You can write code to program a robot

Plan

Get two slices of bread
- Find two slices of bread
- Find plate
- Write code to move plate onto counter
- Write code to move bread onto plate
Butter the bread
- Find butter
- Write code to move butter to counter
- Use knife to spread butter
Put slice of ham on one of the slices of bread
- Find ham
- Write code to move ham onto bread
Put some sliced tomato on the ham
- Find tomato
- Use knife to slice tomato
- Write code to move sliced tomato onto bread
Put some pickle around the tomato
- Find pickle
- Write code to move pickle to counter
- Write code to move knife around tomato, not across tomato
- Use knife to spread pickle
Put the other slice of bread on the tomato and pickle
- Write code to move slice of bread onto tomato and pickle

Tada:

Agents all the way down…

The “planning agent” sounds like the conductor running the show. I used that model to explain the concept, but it’s much more complicated - some of the steps may be too complex for code writing agents to solve. Moving stuff with a robot, for instance, is hard. In fact, you might create special planning controllers to handle those tasks - a “plate moving agent”, with access to tools like “limb mover”, “machine vision tool” etc. So the most sophisticated agentic systems are composed of multiple agents, using both pre-built tools and custom code, collaborating to reach a goal.

Why this matters.

Making software is expensive. It’s become cheaper over time - much cheaper! - but economics mean that software caters to “largest common denominator” needs. There are many different types of people who write documents - novelists, lawyers, letter writers, bloggers, students - and they all have different needs. And within those groups, there are different types of people, in different states of mind - a novelist might want to write a quick first draft, or go through feedback from their editor, etc. And yet, the number of word processors on the market is tiny - I know of 4. There are a few dedicated tools aimed at specific groups - novelists have Scrivener, for instance. But no two novelists work the same way.

So, what if we could instruct an agent to assist with writing a document? It might create (or reuse) the user interface for writing in, monitor your writing, notice when you’re stuck and suggest how to move further, package the final work into the file format you need, and send it to its intended recipient. You’re not using a word processor, you’re writing a document.

Agents allow us to create tools for specific situations, and adapt to circumstances as they learn.

For instance, if you are planning to buy a house, you probably have some requirements that don’t neatly fit into the filter criteria on the real estate website. For instance, my ideal home would be less than 5 minutes from a beach or a forest, and walking distance to bookshops, specialist food retailers, and a few nice bars and restaurants. It would have nice views, and a garden that’s got good light. It would have wall space for my bookshelves, and it would be quiet. At least 3 bedrooms, and I’d want a good-sized, well-lit kitchen. I don’t mind if it’s a flat or a house. Without an agent, you’d start by researching locations first - but it would be a bit haphazard. How do you find a place near a beach or forest, with the kind of amenities I like? Once you’ve got a shortlist of locations, you’d go to the real estate listings sites, and use the basic filters - location, number of rooms, price - and start scrolling. You’d look at hundreds of properties, and make a short list.

Instead, imagine having an agent that can search for the things you’re interested in, and send you a daily shortlist; you could tell it what you do and don’t like about each option and it would refine its search.

Yes, you could write traditional software to find that property short list. It would require a fairly skilled developer, and days or weeks of effort - but it would not work for someone who wants a flat within 5 minutes of an underground station on the Central or Victoria lines, with South-facing windows. And as you discover what you do and don’t like, you’d likely have to get that developer to continuously change the code. In short - it’s not economically viable.

It’s still early - there are many challenges with agentic architectures. LLMs are getting better and better at (what looks like!) reasoning, but they’re far from perfect. More complex scenarios quickly run into limitations - for instance, the LLMs context window becomes too small to contain all the information needed. There are questions about how to ensure the code written by the code-writing agents is safe, let alone efficient. Many of the showcases (e.g. on Reddit) are more like smart workflows than true agents.

Nevertheless, progress with LLMs has been accelerating. The tooling for agentic systems is improving dramatically, and there are several great libraries of tools, agents, and LLMs available on Hugging Face. I can’t wait to see what’s next!

Generative AI is coming for my job. Maybe.

2024-07-01T15:44:43+00:00

I started work in 1989, in an office in Central London. I was a graduate trainee, and shared an office with a few secretaries. Not executive assistants - secretaries, whose job it was to take tapes recorded by the management team, and turn those into typed-up documents. Every morning, they’d get envelopes in the internal mail with dozens of tapes, and spend the day typing them up. Some used WANG word processors (dedicated computers that only ran word processing software); most used electric typewriters. The head of the typing pool would spell check important documents; she was formidable, and would often suggest grammatical improvements to “tighten it up a bit”. For the typists using electric typewriters, that meant starting over again…

We had an internal mail department, who came round twice a day to pick up and deliver envelopes for distribution within the company. For urgent messages between offices, you could ask for a fax to be sent - but only if you had a cost approval code, because faxes were expensive. Some people had direct phonelines - very much a status symbol - but most of us received calls via a central switchboard, staffed by the receptionists. Our sales team had lots of sales administrators, whose job was to take orders over the phone and make sure they were processed properly - initially, that meant filling out paper forms for the fulfilment teams; later, and partly as a result of the work I did, that meant creating orders on “the computer”, a mainframe system running our own, home-built ERP system.

I joined the IT department as a trainee; I learned COBOL, but gravitated towards a newfangled concept called the relational database, along with it structured query language (SQL). The IT department’s job was to write software for the mainframe - we wrote payroll, order processing, inventory management and accounting software, gradually replacing the paper systems that had run the business historically.

None of those jobs exist anymore - at least not in the same form. Typists, WANG operators, internal mailrooms, switchboard operators - I haven’t seen them for decades. Internal sales teams do much more than transcribing orders, and IT departments don’t write their own payroll systems any more.

Instead, everyone types their own documents, sends them via email or messaging systems, uses mobile phones. IT departments buy off-the-shelf solutions for common business tasks - usually operated by SaaS vendors; in many organisations, the IT department is primarily engaged in managing vendors.

And yet…there hasn’t been a huge spike in unemployment. The typists I shared an office with went on to become office managers, sales people, finance administrators. One went off to become a teacher. I caught up with one of my IT colleagues about 15 years later - they’d joined a start-up building a payroll SaaS platform.

My job as a programmer depended heavily on my ability to memorize syntax, and to type it into the computer correctly, first time. There were no “visual editors” - you had to type the instructions into the screen perfectly, line by line, or start all over again. Feedback was slow - it could take a good 15 minutes for a SQL query to return results, even on modest data sets. I would usually write out my code in pencil first, maybe show it to a colleague for feedback, go to lunch and re-check it after, and then, slowly, type it into the mini computer’s terminal. Building a new data model would take a week or more; making a mistake halfway through would often mean starting all over again. I didn’t have a computer terminal on my desk - it was covered in printouts of code, manuals, specification documents. The office soundscape was dominated by daisywheel typewriters, dot matrix printers, and the squeals of the (then modern) fax machines.

My project was to design and implement a database to award customer rebates. The model was fairly complex - rebates were awarded based on a wide range of criteria, there were exceptions everywhere, and identifying the requirements required reading the rebate contracts which was not much fun. The whole process took 9 months, and covered only a subset of customers. Building and populating the data schema took months; writing the queries took longer.

Even without generative AI, that process today would take days or weeks. The tooling is so much better. I barely need to remember syntax - the IDE just does that for me. I can automate everything so that I can easily go back and change fairly big design decisions without throwing away work. And even with large amounts of data, queries are lightning fast.

And today, as an experiment, I fed what I remember of that project (it was a long time ago!) into ChatGPT and asked it to design a schema and key queries, and it did so in minutes; with a few tweaked prompts, we got to a pretty credible solution in about 30 minutes. And a quick Google search shows that there are several off-the-shelf SaaS rebate management systems.

So the cost of solving a fairly common business problem has gone from “months or years of developer time” to “a small percentage per transaction”. The result is not that we have completely satisfied the demand for rebate systems, and then laid off everyone who worked on those systems - the result is that more and more organisations offer rebates. I now get rebates on train tickets, coffee purchases, holidays etc.

I read an article exploring this concept (can’t find it now!) about how reducing the cost of electrical lighting hasn’t reduced our consumption of electricity - it has increased, because things are now economically (and technically) viable that previously were not, for instance the huge sphere in Las Vegas.

Las Vegas sphere Image credit: By Cory Doctorow from Beautiful Downtown Burbank, USA - The Sphere as Mars, view from my hotel room at Harrah’s, Las Vegas, Nevada, USA, CC BY-SA 2.0, https://commons.wikimedia.org/w/index.php?curid=136454844

This huge building features 54,000 m2 of external LED displays which have shown a range of stunning images. The energy consumption of the Sphere, running at full capacity, is estimated to be 28 megawatts - enough to power 21.000 homes. That’s a lot - but without LED technology, it would have been a multiple of that (leaving aside the technical challenges of using incandescent bulbs in a similar way). Someone, somewhere, figured out that it’s economically viable to expend 28 MW in return for the revenue from a sell-out gig.

The same is true for software. As the cost comes down, consumption goes up, and in a business context, software generally reduces costs. That money may disappear in shareholder pockets - but more commonly, those reduced costs for a specific capability open up new opportunities, which in turn create a demand elsewhere.

Augmentation versus replacement

I’m a sceptic when it comes to AI replacing huge number of people. Remember how self-driving cars were “very close”? Even though the various LLMs we see now are super impressive, there are many tasks humans do very easily at which they fail - and the hallucination problem is not going anywhere. When an AI beat a human at Go, many people saw that as a quantum leap in AI capability - but there are relatively simple ways to defeat those AIs.

At a very simplistic level, I think all these limitations come from the fact that the AI systems we build today don’t have an internal model of the world. When you ask an LLM to write a joke about a cat, it doesn’t think about what it knows about felines and their behaviour, and look for things it believes are “funny” - it effectively searches a huge training set, in ways we can’t quite express, for data that matches cat, joke, etc.

Now, in my domain - writing software - there are many tasks that can be learned from a training set. To connect to a database, you can simply find examples of connecting to a database from your training set, without understanding the concept of “database”.

But to write a complex system that represents entities in the real world, where you’re unlikely to find a close match in your dataset, you do kinda need a model in your mind of those entities, and a process for discovering more about them.

So what?

I think these things are all true:

AI tools will improve the productivity of professionals, just like IDEs and virtualization improved the productivity of software developers in the past. This increased productivity may reduce demand for those professionals - but I expect that the reduced cost will increase demand.
AI tools may reduce the need for some roles - especially “junior” roles. In my experiments with ChatGPT and Copilot, I found that much of the work that would normally go to a more junior developer can be done by an LLM. “Here’s the outline for a method - fill it out, add error handling and make sure it’s robust”, “Here’s a schema and some sample queries, write a new query to do X”, “Here’s a failing unit test, improve the code so that it passes”.
There is an upper limit on the improvement of LLMs to handle real-world cases. The improvement curve is an S-curve, not a J-curve. I believe this for several reasons: the large LLMs have used all the easily accessible training data, the cost (energy, compute, whatever) of improvements is not linear with the quality of the improvement - adding another dimension increases the cost exponentially. And, philosophically, without a model of the world, LLMs depend on memorization and retrieval, but many real-life situations don’t lend themselves to that model.

Digression: model of the world.

Let’s take, as an example, the challenge of recommending a book to someone.

Before Amazon, there was a website (I’ve forgotten the name now) where you could say which book categories you liked, and which you didn’t, and it would recommend a “secret mix” of books you might like, which looked suspiciously like the best sellers in the categories you liked. It was not particularly useful.

Then came Amazon, with its “People who bought this also bought that” recommender. At the time, it felt like magic - it felt like Amazon had looked at my book cases, and knew what I liked. The way it did that was to find people whose choices were like mine, and show me things they’d bought that I had not. While it felt like magic, it doesn’t necessarily work for new books - because not enough people have bought the book yet. It also doesn’t really generalize - my preference in books would not necessarily carry across to my taste in cheese.

Now, when you ask ChatGPT for recommendations, the results aren’t inspiring. A prompt like “Here are 10 authors I like, can you recommend any books?” gives you back many books by those authors, and then some very obvious best sellers within the same categories. Asking for less obvious recommendations doesn’t really do much. Engaging in conversation - answering questions etc. - still doesn’t seem to create compelling recommendations. For instance, even though I said “I’ve read all of this author’s books”, it still recommended several books by that author…If you then ask ChatGPT to recommend cheese based on your literary tastes, you get a generic list of cheeses.

The way the LLM does this is hard to explain - Stephen Wolfram’s explanation is the most user friendly one I’ve found. But what matters is that it is not forming an understanding of books, people, tastes, cheese - it’s building a huge library of “what’s the most likely next (part of a) word given this chain of other (parts of) words”. This creates very credible text, and it’s good enough now that it broadly answers the question. But it doesn’t really understand. You can ask it ever more ridiculous questions - “based on my literary tastes, which shampoo would you recommend?” and it keeps answering. The answers look credible - but I’d be amazed if my taste in books predicts which shampoo I’d like.

Compare that with how a human would approach this problem - humans create an internal model of someone based on what they tell you, and, somehow, match that model to books. Or cheese. I don’t think we know quite how we do it - partly based on experience (“my uncle liked Len Deighton novels, and he loved Red Leicester!”), partly based on stereotypes (“Len Deighton is a bit old fashioned, quite blokey, I bet you also like Red Leicester”), partly on half-remembered snippets (“I think there’s a bit in one of the Deighton novels where he mentions Red Leicester - let’s go for that!”). We’re quite likely to say “I’m not sure there’s a reasonable way to go from books to cheese”.

The same is true when we ask the LLM to write software. It does a pretty decent job - and because it’s probably seen lots of examples of similar code, it gives you something that will probably work. And for many applications, there’s prior art - if not for the whole thing, then for sub systems. But I don’t believe that an approach that is basically “find the closest analog to the prompt in my database” will allow you to build brand new applications.

Much of software development - and knowledge work in general - is not about “writing code”. It’s about navigating the context, exploring and balancing the many tradeoffs, choosing what to optimize and what to downplay. It’s about breaking large, complex problems down into chunks you can reason about. It’s about thinking about edge cases - not just the things that go “right”, but all the ways they can go wrong. And to do that, we humans definitely depend on heuristics - “if it’s an ecommerce application, it probably needs to deal with Black Friday traffic spikes”. But we also depend on our understanding of the problem domain - “if it’s an ecommerce site selling medical equipment, there may be a regulatory requirements”, “if it’s an ecommerce site sellling high-value items, we should think about fraud prevention”.

So, consider the following requirement:

design a database schema for an ecommerce site

You can pass this as a prompt to ChatGPT and get a perfectly reasonable response. Give it to a moderately experienced developer, and they’ll ask about your business domain - regulated? High fraud risk? What sort of traffic spikes do you expect? How many products do you have? How many customers are you expecting? Is this a one-off experiment, or the basis for a strategic asset? In their head, they’ll convert your initial requirement into something like:

Design a database schema for a website that see occasional traffic spikes, sells low and medium value items, some of which are regulated. The design needs to be extensible.

Sure, if you feed that prompt into ChatGPT, it will do an equally credible job - but right now, I wouldn’t be confident that “credible” is the same as “definitely meets my requirements”.

Get to the point, Nev!

The reason humans can recommend books, or design database schemas, in ways that the LLM can only achieve once you’ve prompted it with all the additional context, is that we have a model of the world that goes beyond “context and prompt”.

I think that any job that depends on manipulating information - and that’s pretty much every job - can be improved by our current generation of AI tools. The jobs where people interact with that information through digital means are the obvious starting point. The productivity enhancements are likely to increase the number of jobs that use digital tools - for instance, https://www.chefrobotics.ai/

Super human?

Computers have been superhuman for - well, since the abacus. A computer can do sums faster than a human can. It can store, search and process more information than a human can. It can control motors with more accuracy, sense feedback more granularly, it can try more possible solutions than a human can. And over time, the area where computers have become “super human” has grown - auto pilots, chess and go, spotting anomalies on x-rays.

At this point, I think we can say that computers can become superhuman in domains where one or more apply:

the rules can be known, and expressed. A rule that can be known and expressed is also known as an algorithm. Algorithms now scale to planet-size data - Amazon recommends books, your phone can guess the next word you want to type, Google can find the most relevant web page for your search term.
the rules cannot be known or expressed, but there’s a bounded universe and set of goals and constraints that can be expressed formally. In chess, the constraints are the rules, and the goals are to win. Even though the number of possible chess games is effectively https://en.wikipedia.org/wiki/Shannon_numbe), machine learning algorithms can search the space and find ways in which they can win.
- the rules cannot be known or expressed, but there’s a lot of training data. While the rules of grammar can be expressed, the rules of “creating meaningful text” cannot, but by feeding huge amounts of text to ChatGPT, it can find “rules” that allow it to generate meaningful text. ChatGPT is superhuman in the sense that it can create meaningful text in ways humans cannot - my abilities top out at about 5 languages; ChatGPT can generate meaningful text in far more languages, and translate credibly between them.

In all those domains, I think the rapid increase in processing power and available data are likely to lead to equally rapid improvements of AI capability.

But…

To become superhuman in more than one domain - chess and language generation? Detection anomalies in X-rays and recommending books - is nowhere near on the cards. The first LLMs would make hilarious mistakes when asked maths questions - they are now much better, but that’s because they outsource the maths to a component that is not an LLM.

So, theoretically, you could imagine a huge AI made up of an LLM to do language processing and a bunch of “co processors” to do specific tasks like play chess, or do sums, or recognize things in images. It would almost certainly be a terrible driver, though - because the problems we solve when driving are not really about language, or even expressible as language.

And without a model of the world, an understanding of cause and effect, not just the next most most likely token, you can imagine an LLM being trained to ask questions about requests, to allow them to self-refine the prompt that executes the task (i think that would be an interesting experiment!). [https://arxiv.org/abs/2205.11916}(Zero-shot chain-of-thought prompting) shows how an LLM can be encouraged to “reason”; it wouldn’t be too big a leap to include “ask questions” in the interaction model to improve chain-of-thought.

Nevertheless, I put “reasoning” is scare quotes. I think it’s more accurate (right now, at least) to say that LLMs exhibit behaviour that looks like reasoning. Or rather, it looks like human reasoning. But, fundamentally, it isn’t. And there are a number of limitations that currently don’t have a credible resolution - mostly because the lack of model of the world. When an LLM encounters a situation it cannot match to its training data, its behaviour is…unpredictable. And there’s a limit to context (the amount of information the LLM takes into consideration when deciding what to do next) and the depth and subtlety reasoning you can get out of an LLM. As I understand it, the context and depth characteristics scale exponentially in terms of cost - so it’s not obvious the “LLM as Swiss Army knife of AI” model is sustainable.

Prediction.

I’m in my late 50s. I don’t think my job as a software consultant will be replaced by AI before I retire. But I do expect to use LLMs to become more productive, and I expect to see many roles to be transformed - software development will be ever less about remembering the magic invocations that the processor understands, and ever more about understanding the problem domain and describing it precisely and consistently.

Team Topologies - journey, not destination?

2024-07-01T15:44:43+00:00

A colleague introduced me to Team Topologies soon after the book came out. We both enjoyed the book - it finally gave us a framework for the conversations we’d been having. We were working for a large retailer, replatforming their online store, and were seeing many of the challenges that led to the Team Topologies approach. Simple changes would require the alignment of many teams, work was passed around like a hot potato, and nobody was clear about who owned what. There were around a dozen or so teams, some aligned with “business features” like product discovery or fulfillment, some were aligned with technologies like the content management system, some were aligned with architectural layers like “the front end team”. The application architecture, unsurprisingly, was incredibly complex, with diffuse responsibilities, duplication and “here be dragons” sub systems.

By viewing the organisation through the Team Topologies lense, we could see what the problems were - and a route to improvement. But we were in the middle of a high-stakes project, with severe delivery pressure and only limited appetite for change - the client organisation worked with several vendors as well as their inhouse team, and everyone had commercial and political interests to worry about. Most of the writings on Team Topologies take a strategic approach to designing the organisation - Wardley mapping, finding fracture planes, considering whether something could be an independent service etc.

Our context didn’t support such a strategic approach - the politics were too complicated. Instead, we took a very tactical view - we used a new feature request that the business was invested in as the tracer bullet, and looked at all the hand-overs between teams that were required to deliver the feature. It went something like this:

Business stakeholder -> Product Owner -> Business Analyst -> Technical Architect -> Tech lead(s) -> Backend Developer(s) -> Front-end developers -> Quality Assurance -> Release Manager -> Release team.

This looks like mini waterfall, but in practice it was more agile - I’ve only captured the responsibility handover, not the many conversations that happen along the way, and the team was pretty good at delivering iteratively and incrementally. Nevertheless, there were many handovers, the overall duration for a small slice of work would routinely be many weeks or months, and the business stakeholder could never really be sure who was responsible for what. And often, features required input from designers, copy writers, MarTech specialists, etc.

We got buy-in from the business stakeholders to create a dedicated team. There was a real incentive - the purchase journey this team were working on was complicated, the items were expensive, and there were very high returns, so an improved purchase journey was commercially valuable. We started by creating multi-functional teams, so the handover chain went

Business stakeholder -> Team(s) -> Release Manager -> Release team.

We had enough work for 2 teams, so split the work between “find, configure” and “buy, track”. The interface between those teams was complex - they had to agree on what the “product” was, what the rules were for allowing customers to buy it, etc. We decided not to address that in the first instance - we just wanted to get started, do the smallest possible transformation.

This worked well. The teams were routinely taking work from idea to working software in a matter of days, and deploying to a QA environment. Our business stakeholder was delighted - both with the product, and the way of working. We had several sessions where we took feedback and deployed quick changes to the QA environment before the meetings finished - the business stakeholders were very impressed with this.

They asked for more - so we agreed that deploying to production, not just a QA environment, would be the next step. We talked to the operations team who historically had owned the path to production, and agreed an architectural solution that would keep our new teams’ work separate from the rest of the application, and had a small team extend the CI/CD pipelines that already deployed to QA to go to our production environment. The team was now deploying multiple times per day, and at a very high quality level.

We gradually extended the number of stream-aligned teams - one or so per quarter, and once we had 4 or so, invested in a platform team to take care of some of the repeated work that those teams kept bumping into. As there were a few shared concepts that were used by several stream-aligned teams, we built some “X-as-a-service” teams who owned the semantics of those shared concepts, and expose them via an API, rather than having multiple teams work in the same codebase. They became “complicated subsystem” teams.

It wasn’t pain free. We had some really challenging situations around line management and HR - individuals felt they were losing connection with the people in their discipline, and line managers found it difficult to keep up with their reports. And of course there was continued pressure from the business stakeholders to deliver, to go faster - and sometimes, the teams stumbled. We invested in more and more “shift left” on quality - but some bugs snuck through to production. The shift to “you build it, you run it” was an HR challenge, with different pay and time-of-in-lieu for different teams/vendors. There were business priorities that cut across our stream-aligned teams and re-introduced handovers, dependencies, bottlenecks.

But we had set up metrics collection from the start, and while “value” is hard to measure, we saw impressive improvements on the https://docs.gitlab.com/ee/user/analytics/dora_metrics.html. Deployments went from once a fortnight to multiple releases per day; lead time for change went from “months” to “days”. We had some challenges with change failure rate once we added more teams - but we learned from that experience and improved quickly.

More importantly, perhaps - the relationship between the “business” (ultimately, the budget holders) and the delivery organisation improved. The business folk felt they understood how the work flowed, who was responsible for what, and what the trade-offs were.

What did I learn?

That’s all very well in practice, but it will never work in theory.

Our initial thoughts about how to identify our teams were a long way from where we ended up. While we had a deep understanding of the business domain, and the bounded contexts made sense, it was mapping the handovers in a couple of example features that identified the biggest wins - cross-functional teams, aligned with a key business win. In retrospect, it was a subset of one of the higher-level domains we’d identified. Once the concept of teams with limited handovers had sunk in, we found several opportunities we hadn’t previously identified - especially in the “X-as-a-service” teams.

Some structures don’t fit neatly into the Team Topologies paradigm - and that’s OK.

We really struggled with some of the folk we worked with. User Experience designers, for instance, wanted - needed! - to have a view across the entire application. They might contribute to a stream-aligned team in a specialist role, for a fairly short period, by creating UX designs, and then have feedback sessions as work progresses. We couldn’t align them with any one value stream, they weren’t enablers or complicated subsytems, so we decided not to worry about it. It was fine.

Small wins, chained together.

We did not have a mandate from the executive team to make wholesale organisation changes. There were good reasons for this - but in retrospect, I think the decision to start small was the right one. We made quick progress, learned from reality, and found a good cadence. It was very gratifying to see the teams in the “legacy” organisation look enviously at the new teams. We created a Slack channel to celebrate small wins, and it was the most busy channel, with lots of positivity.

Numbers matter.

My colleague insisted we use the DORA metrics as a way of evaluating our progress, and they were right. It’s very easy for the new normal to become - well, the new normal. By looking at trends over time, we could see where we were making progress, and adjust where necessary. Keeping track of just 4 numbers (though supported by lots of underlying metrics!) was not a huge burden, and when business folk complained about how long things were taking, we could show objective statistics - “it used to take 2 months for a change to make its way to production, now it’s a few days” was a very powerful argument.

People matter more.

This was a challenging engagement. It was high pressure, there were lots of business changes going on that made it hard for people to commit to decisions, and the entire industry was in turmoil. I think most of the team really enjoyed the clarity and focus of the new team structure - but change can be emotionally stressful, and we had a few people who felt very uncomfortable. We supported them as best we could, but could have done a better job, I think.

“Estimating is hard. Maybe impossible. So we must do it.”

2023-02-02T15:44:43+00:00

Why estimate?

Software estimating has been a challenge for as long as I remmeber. At the beginning of my career, we used CoCoMo and function point analysis - rigorous, evidence-based methods that nevertheless rarely yielded “accurate” estimates.

Since then, there’s been a lot of thinking/writing, especially in the context of Agile. Some people say it’s a delusion, or fraught with challenges. Mike Cohn wrote a great book on the topic.

But whether we like it or not, those who pay for the software will want to understand what it will cost, before they release the funds.

That’s the paradox. Most people I’ve worked with agree that estimating is hard or impossible. But in order for the work to go ahead, we need an estimate.

The cone of uncertainty.

This problem tends to be biggest right at the beginning of the project/product. There’s an idea or a hypothesis, and some degree of confidence it will be valuable - sometimes described in financial terms, sometimes just as a broad outline.

So, the first question becomes “I have this idea, I think it’s valuable, what would it take to bring it to life?”. Often, the next step in that process is to flesh out the idea - to imagine what the interactions would be, what it might look like, how it would work. And then you have to figure out what it would take to build the idea.

And this is where you run into the cone of uncertainty - early in the lifecycle, there is so much uncertainty that any estimates are likely to be based on missing or inaccurate information. You don’t know what you’re building, who’s building it, what constraints they are working under, you don’t know what’s going to go wrong - or what’s going to go better than expected.

And yet, in order to make progress, you have to allocate resources - people, money, attention.

Turning the Return on Investment (ROI) conversation upside down.

So, how do you come up with even a broad outline of the cost (in time, people, money or whatever) at this stage?

I like to use a simple framework for evaluating ideas - Marty Cagan’s 4 attributes of risk. The idea is that you take each feature making up the bigger idea, at a fairly coarse level, and score them on 4 attributes:

#Value# - will this bring value to users? If it’s a commercial product, would they pay for it?
#Usability” - can we make this something users can interact with, ideally with some degree of joy?
#Feasible# - do we know how to build this?
#Viable# - will this align with the organisations goals and constraints?

The trick is to find the area where those for attributes overlap - that’s the subset of ideas that have value to users, will be usable, make business sense, and can be delivered.

Here’s an example from a (long-since defunct) start-up I worked for. It was an ecommerce site, selling custom framed prints online. We had thousands of ideas, ranging from obvious “must-haves” like a product catalogue, basket and checkout, to some fairly wild and exciting options like a virtual art gallery with digital sales people.

Here’s how we started figuring out our product strategy. This is an abbreviated form - there were about 20 ideas - and we spent an afternoon working on the evaluation between business, technology and design.

Idea	Valuable	Usable	Viable	Feasible
Product catalogue	10	?	10	10
Visual preview of product	10	8	9	?
Basket	10	7	10	10
Checkout	10	?	10	?
Virtual gallery	?	8	?	?
Rich product content	8	10	4	6

We could see immediately that there were some ideas with question marks - we simply didn’t have enough information to even guess. Our designer was concerned about the usability of the product catalogue - how could we help customers chose between the hundreds of thousands of products we had? For the “visual preview”, we weren’t sure how to build it - there were several options, but none looked like a great match. Checkout was complex because shipping options were an important consideration, but both the design and tech team needed more information in order to figure out how to make it work. The Virtual Gallery idea had a clear design approach, but our business stakeholders weren’t really sure if customers would value that feature, and the technical feasibility was unclear.

This exercise was quick, and we avoided diving into detail - the goal was to get through the list, so we knew where to focus our attention. We agreed collectively that the Virtual Gallery idea was a “not yet” concept. We agreed that the design and tech folk would work together on the checkout process to investigate options and find a workable solution - the value and viability were obvious. Finally, we agreed that the tech team would look at ways to deliver the “visual preview” function.

For areas where design and technology had given high scores for usable and feasible, we asked both teams to break down the idea to specific features - this ensured we didn’t have big misalignments. Each idea broke down into 6-10 features - so again, these were fairly quick conversations.

At this point - after spending hours, maybe days, but certainly not weeks, you should have a broad outline of the big ideas, and the way they break into features. Can you estimate against this? Well, you can try - and there’s certainly value in the discussion. But the estimate’s accuracy is unlikely to be high.

Instead, I’d rephrase the question. Instead of asking “What will it take to deliver this idea/feature?”, which suggests a degree of accuracy you can’t deliver at this point, you should ask “What level of effort gives us a better than evens chance to deliver this idea/feature”. Why is this better? Because you can answer this question much more quickly - by accepting the 50% error margin, you’re accepting the uncertainty. You avoid the urge to investigate every single edge case, and you can move much faster. When someone asks “What about ….”, you can find out if it makes a material difference to that 50% margin - in most cases it doesn’t, and you can move on.

The output of this process is a list of product ideas and features, with a “better than evens” estimate. For our ecommerce start-up, this process took about a week. But “better than evens” is not good enough - so we multiplied every estimate by a “risk factor”. We were pretty confident we had to deliver all of the features in order for the solution to be viable, so we didn’t want to risk any single feature blowing the budget - so our risk factor was high (we tripled the estimates). Your risk appetite may differ!

At the end, we had enough information to agree on the return on investment for each idea/feature.

We categorized them into 4 groups: “obvious wins” (high value, low cost), “hygiene factors” (high value, high cost, enabled core proposition), “obvious nos” (low value, high cost), and “further investigation”.

This was enough to secure our funding, and start work. We didn’t need to go into complicated work breakdown processes.

Estimates as non-functional requirement.

So, that’s great - we had a shared understanding of what we were building, and had a (very) rough estimate of the effort required. We started work (lots of other things had to happen, like building a team etc. - that’s a story for another time!), and - as always - found out that our “estimates” were….wrong, even after we’d tripled the “better than evens” estimate. Or rather, we found that when we looked at individual features in detail, we uncovered different amounts of work than fit in the estimates. Some features were much easier than we’d expected, some much harder. The variance had many causes - sometimes, we’d just not understood the complexity. In some cases, the design folk came up with a much better option, that was harder to build.

The solution to this was not to re-do the estimates. It was to treat the estimate as a non-functional requirement in the delivery process. So, rather than asking the team to design the best possible implementation of the idea (in all senses - user experience, visual design, technical design), we asked them to design the best possible implementation that fit within the estimated budget.

This is, of course, difficult - how can a visual designer assess the implementation cost of their work? How can a software engineer stick to a budget in the light of all the uncertainty they face? The good news is that while it’s difficult, it’s much easier than estimating accurately far in advance of having the detail. Generally, it’s easier to estimate small-ish pieces of work (features, epics) than estimating an entire application.

We also had teams work together during the design phase - visual designers, engineers, testers, product folk all collaborating on finding solutions that fit in the budget. This collaboration was incredibly powerful, and as the team built relationships and momentum, they squeezed a lot of functionality out of the time they had available.

Handling the unexpected.

Of course, there are cases where you simply cannot fit the work in the estimate - at least not without affecting usability or value. We handle that in the same way as many other non-functional requirements. When the team couldn’t figure it out, we had a review meeting where the team presented the options, and we collectively agreed which option we’d ask them to pursue. Sometimes, the option was “spend more time/money” - but sometimes, it was “reduce functionality”, or even “leave the feature out”.

This is not a magic bullet - but it makes the decisions explicit and transparent, and it takes the responsiblity away from the team.

So, what’s an estimate anyway?

Our approach was not perfect - in some cases, we really did get it dramatically wrong in the up-front scoping, and even after we’d agreed to change the boundaries, the team took longer than the estimate (and re-estimate). But there were very few surprises, and overall we came in within the budget we’d set.

So, the way I see estimates is not “what will it take to deliver this idea/feature/change?”. My way of looking at estimates is “how much time and resources would give us a reasonable chance of getting this idea/feature/change done?”.

This changes the emphasis from expecting certainty and perfection, and acknowledges you’re trying to assess chances, rather than certainties.

“My boss is terrible - shall I tell _their_ boss?”

2022-11-09T15:44:43+00:00

I am mentoring a few people, and in one of our conversations, my mentee complained about their line manager. “He’s terrible - every time something is even slightly off track, he turns it into a crisis. We’re constantly fighting fires - and they’re mostly imaginary!” My mentee was feeling very stressed - being in constant crisis is exhausting, and having to switch your attention every few days because there’s another fire to put out is terrible for productivity. I sympathized, and then my mentee asked if he should raise this with their boss’s boss - the skip level, or grandboss.

I invited them to follow that train of thought - what will the grandboss think when they hear this feedback? “Hm. Well, I suppose they’ll want to know what they can do to help?” That seemed optimistic. “Don’t you think your grandboss has enough problems? Why would they consider this a priority? Most people have “my problem” and “not my problem” mental categories - why would your grandboss make this their problem?” My mentee thought about this. “Well, because it’s affecting the team, and I can’t fix it”. “OK, let’s say that is how it plays out. What’s your grandboss going to do next?” “Uh - they’ll replace my boss, or get the training or something?” That again seemed optimistic. “Maybe, but don’t you think they will talk to your boss first, get their perspective?”. I saw the lightbulb go on.

“Managing up” is difficult. Sometimes, you end up with a boss you can’t work well with. Early in my career, I had a boss who was incredibly warm and caring, very much focused on personal development - but they were not “technical”, and thus the technical decision making was terrible. This meant lots of rework, lots of integration problems, and a reputation with other teams that we were flakey. When I raised it with my skip level boss - a very smart, career-driven operator who went on to become the CEO - he told me he knew (we had a high level of trust after working together on some challenging deadlines). “You think I don’t know? Your boss is a great human being, and has had a lot of success leading teams, but in this case, the work is too technical. I’m in regular meetings with you boss, and I’m aware of the problem.” I looked at my grandboss, waiting for the next sentence. My grandboss looked back. I ventured a very convincing “Eh, so…?” My grandboss sighed. “I’m aware of it. The situation persists. What do you conclude?”. I thought for a moment. “Either you’re working on it and can’t tell me, or you’re not working on it?”. “Okay, let’s see why I might not work on it?” I hated it when he did this. I was like being back at school. “Uhm…because there’s nothing you can do, or because you don’t think it’s the most important thing to work on right now. Maybe there are constraints I’m not aware of?”. And so I found out that, yes, my grandboss was aware of the problem, that they had a plan, but couldn’t tell me what it was, and that it would take a few months to play out.

I was very lucky - I had a grandboss who I trusted, and who was able to have this conversation in relative openness. They could easily have gone to my boss and discussed the feedback - and that would have made my position much harder. They could have told me to stop complaining and fix my own problems. They could have asked me if this was just personal animosity, and whether I had any specific examples. They could have insisted it was not a problem at all.

In short, when talking to skip level bosses about your own boss, I’d be very circumspect. Think through the next few steps that will follow your comments, stick to facts, and make sure you have a trust relationship.

Architecture, software, and business

2022-08-19T15:44:43+00:00

I just finished the https://certification.opengroup.org/examinations/dpbok/dpbok-part1. It was pretty good - some of the material is presented as “indisputable fact”, when reasonable people (okay, me) might disagree. And to get the certification, you have to memorize those facts. But passing exams is a thing I’m good at.

The final chapter of the Digital Practitioner’s Book of Knowledge is about architecture. This is described within the context of “enduring enterprise” - large, long-lived organisations with multiple products, complex environments, and challenging communications. In one of the study materials, Charles Betz describes the work he did as an architect. Very little of it involved boxes and arrows - he writes: “…the architecture team was a mechanism for synchronizing across the organization”.

This came to mind in a conversation with a former colleague, who asked for advice on the career progression for one of their reports. The person in question is a talented developer, with a strong Computer Science background, a proven track record of delivery, and a somewhat abrasive personal style. “He says he doesn’t want to do politics, or management - he just wants to be able to fix the technical decisions he disagrees with. He’s asked for a promotion to Technical Architect”. This set off alarm bells. The idea that technical decisions aren’t deeply “political” is hard to reconcile with my experience. The very worst technology decision I recall was at a well-funded start-up, where the CEO “chose” the development platform because he was friends with the vendor CEO. In another case, there were two development teams who couldn’t work together for tiresome political reasons, and we had to design the entire solution to avoid any communication between the two (yes, we may have accidentally invented micro services).

So, we started talking about how architects “get things done”. Even in the most hierarchical organisations, an architect may have positional authority, but that rarely means much. Usually, the delivery teams have more political clout - after all that’s where the business benefit is going to come from. In the quarter century I’ve been doing this, I don’t recall making any big decisions without support from at least some, usually most, hopefully all, the impacted parties. I have a friend who is Chief Architect at a FTSE100 company - and they have never once decreed a solution without the consent of those affected.

Architecture should impact the world - otherwise, we’re just people with opinions. The impact comes not from making the decision, but from seeing that decision carried out. And carrying out the decision requires the participation of more than just the architect(s).

Getting things done, ultimately, means building a consensus, and an environment where people can disagree and commit. That’s hard to do by issuing mandates and decrees. It is, ultimately, political.

So, the developer who wants to be an architect so they can make decisions “without the politics” is in for a rude awakening. Sorry.

The Pyramid Principle

2022-06-22T15:44:43+00:00

The Pyramid principle boils down communication into a simple, top-down structure which forces you to structure your thinking, and provides senior folk with a logical flow - and makes sure the most important point stands out.

The audience

The more senior the person, the less likely they have time or attention to spare. If you send them an email, it may linger in their inbox for hours or days. If you give a presentation, they may be messaging on their phone. If they do pay attention, they are almost certainly most interested in the way your message relates to their interests. What do you want them to do? Why should they do it?

So, the Pyramid Principle requires you to start with the action you want them to take. I try to make that the subject of my emails - so, instead of “Update on the XYZ situation”, I aim for “Please approve extra budget for XYZ so we can unlock benefit ABC”.

This sounds easy - but it means you have to do the work to clearly state the recommendation, and that can be difficult.

The pyramid

The recommendation, solution, request, whatever, is the top of the pyramid.

You support that with 3 or so mutually exclusive and collectively exhaustive items that make up the recommendation/solution. They typically answer questions like “why”, or “how”. You can keep these brief - a paragraph or two. This is the middle of the pyramid.

The base of the pyramid is the further information supporting the “why”, “what alternatives exist” or “how” questions.

If you’re writing this in a word processor or similar, you can use the Outliner tool to help structure your thoughts - I’ve got into the habit of using this.

As an example, I might do something like this:

Please approve extra budget for XYZ so we can unlock benefit ABC

Benefit ABC will bring in additional revenue.

Customers will pay more for ABC

ABC will attract new customers

The team have a credible plan to deliver XYZ

We already have an API for some of the features

Our suppliers can provide additional support

We’ve got a high-confidence estimate

It fits in the roadmap

The product team consider this capability strategically important

It’s on the roadmap for next year, but we can pull it forward with extra budget

By using the various heading styles to outline the argument, I can see whether I’m repeating myself, whether I’m contradicting myself, and where I need to clarify the argument. Doing this before writing the actual content helps me cut down the actual time to write the recommendation.

I got into this habit many years ago - and I’ve found it helps in conversation, as well as in written communication.

Situation - Complication - Question - Answer

The final recommendation from the Pyramid Principle is to start the top of the pyramid with SCQA (situation - complication - question - answer).

Situation

A brief description of the situation shows you understand the context in which you’re operating, you understand the audience and the world they live in.

Remember how I started this post with:

You’re an expert in your field - a software engineer, perhaps - and you have lots of insights to share.

Complication

But your wouldn’t have a recommendation if everything is hunkey dorey! So, you describe the challenge you face.

At the start of my post, I wrote:

But you find it difficult to get senior management to act on your ideas.

Question

To address that challenge, you try to boil it down to a single question that will (help to) remove the complication.

What can you do to improve?

Answer

Once you’ve phrased the question, you can provide an answer.

You can learn to apply the Pyramid Principle to your communication style.

Resilience, efficiency, hierarchies and networks

2021-09-09T15:44:43+00:00

I was chatting with a friend the other day about the changes we’ve seen over the last few years, and how it doesn’t feel like we’ll go back to the status quo ante.

One of the things we both noticed is that the demand for “simple, clear communication” is at an all time high, and that many leaders seem to want to fill that demand. It’s equally clear that the systems we all depend on are visibly more complex - we’re seeing all sorts of second-order effects from pretty much everything. Global pandemic -> fewer driving tests -> shortage of HGV drivers -> shortage of goods in shops.

Manufacturing, supply chains, logistics.

The last 200 years - but especially the era since the 1980s - has brought incredible improvements in manufacturing, which in turn has brought down costs for most consumer goods. The most expensive physical purchases most house holds make other than their home (car, TV, white goods) have all either stayed flat (compared to inflation) or declined in price. Cars cost baasically the same as they did in 1990 - though the quality is much higher. TVs cost a fraction of what they cost in 1990.

This improvement has come through technological advancements - but mostly through logistics. Most of these items are no longer “manufactured” in the way we imagine - factory halls, with workers and robots creating items from raw steel, polymers, etc. Instead, they’re assembled from specialist suppliers who offer the best price/quality trade-off. Those specialist suppliers, in turn, have their own supply chains; the combination of global markets, sophisticated logistics and instant information sharing has created supply chains bringing the best, cheapest products together in just-in-time operations which require minimal capital investment in stocks, allow businesses to focus on core competences, and deliver products with great efficiency.

Those advances in logistics allow other businesses to operate more efficiently too - retailers hold far less stock than they used to. A building site near me is bringing bricks - low value, heigh weight - from Belgium (presumably because the price is still better than buying from the UK). My independent corner shop has goods from Poland, Turkey, India, the Phillipines, and South America.

Finance

The second trend is the change in finance since the Reagan/Thatcher years. Most middle class people now have a pension pot which they invest at their own risk, and - in very broad terms - interest rates have lagged behind inflation. The stock and bond markets have been financialized to the point they barely reflect the real economy, and there’s a lot of money looking for a return.

This has provided finance for start-ups, funded whole new industries, and helped to achieve those efficiencies I described earlier.

But this capital is also desperately looking for safe, guaranteed income streams, and many of the institutions that allow our society to function can provide exactly that. Once you’ve signed up a customer, if you’re a utility, a telco, a transport company, an insurance company - that customer represents a fairly certain future revenue stream. All the low-hanging fruit has been picked though - but there’s more money than ever looking for a return thanks to the monetary easing since 2008.

And once an organisation becomes a financial asset, it has a legal duty to optimize shareholder value. And that means doing nothing that isn’t legally required. The era of deregulation means that this becomes an ever-lower bar.

Brexit

Lots of points of view on Brexit, but it’s hard to argue that it imposed barriers on collaboration between UK organisations and those in the European Union. While the barriers are (mostly) not about tariffs, they are about paperwork, which introduces friction. In a process where we’re optimizing the last few pennies out of the bill of materials for a washing machine, that friction can matter…

Fragility

So, what have we seen? Mostly, the big systems in the U

Software architects and engineers - where do you draw the boundary?

2021-08-22T15:44:43+00:00

I was talking to a friend about how software teams are structured, and specifically what the role is of software architects. It’s a fruitful topic of conversation - because it helps expose assumptions about teams, processes, and technologies. My friend has recently moved from the title of “software architect” to “technical lead”, as well as change jobs.

So, here’s what I’ve seen in the ways organizations use “architect”.

Architects as super coders.

The simplest case is that the organization runs out of career track for software engineers, and gives them a cooler-sounding title once they’ve got to the end of that track. In this case, software architects are generally exceptional coders, with deep understanding of the business - but their day job generally remains mostly coding. While this is inelegant, it’s perfectly valid - and I’ve seen it work well as a way of retaining engineers. Of course, the downside is that there is no deliberate attention to the other work an architect might do, or the skills required to do that work.

Architects as design authority

The other case is that the software architect takes every major decision on a project - they draw UML diagrams, and the engineers’ role is merely to turn those diagrams into code. One client who operated this way told me they got great productivity from relatively junior developers this way, and that it was a great way to ensure a consistent approach across a complex application. The architect was incredibly smart, but became a bottleneck fairly quickly; they then extended the architecture team, with each architect owning a microservice. That worked well - but once the team exceeded 5 or 6 micro services (and therefore architects), the coordination between the architects became problematic.

More worryingly, it became obvious that while the engineers had been efficient in turning the designs into working code, that code was not always connected with actual use cases - this was an agile team, with regular prioritization and feature pruning, and the architects were often unable to keep up with those changes, so the team built what was designed - but not always what was agreed with the business.

Boundaries of design

With architects as the design authority, the obvious question is “where does design end? How much detail should architects get into?”.

What I’ve seen work well is architects owning the structure of the major subsystems and interactions between those subsystems. A “subsystem” in this model might equate to a microservice, or a “bounded context” in DDD. Agreeing how to carve the overall solution up into subsystems, how those subsystems should interact, and what the major architectural choices for those subsystems are (development language/framework, deployment architecture, build-and-deploy strategy etc.) definitely requires architectural thinking.

However, within those subsystems, I’d want the delivery team to be the primary decision makers, figuring out detailed implementation decisions as they go. Architects can advise and offer a governance structure - but they quickly become bottlenecks, and - more importantly - deprive the delivery team of agency. The delivery team should own the delivery and implementation choices - this drives up quality, improves maintainability, and

Architects as the owner of risk

I used to run an architecture team at a digital agency. My approach was shaped by the circumstances - our projects were rarely very large (the largest had around 60 developers for 12 months), and were usually not at the very core of our clients’ organization (unlike SAP, for instance).

My approach was get the architects to reduce risk for the delivery teams.

Sometimes, that risk came from areas traditionally associated with software architecture - a complex application structure, lots of integration points, communication with many technical stakeholders; we drew lots of UML diagrams, had lots of meetings and conference calls, filled endless whiteboards.

Quite often, however, the overall design of the solution was not particularly risky - a web application, using a well-understood framework, with perhaps some SaaS integration points. In those cases, the architect would discuss the approach with the development team, but act as a sounding board and facilitator, rather than the design authority.

Those well-understood web applications, however, would regularly have some attribute which required dedicated risk mitigation. For instance, an ecommerce application with very short, very high load predictions (think selling tickets to one-off events with world-famous singers). Or we’d have to integrate a mobile app with a brand new IoT device, using a protocol that could be described as “under heavy development”. Or a user experience which navigated incredibly complex underlying data structures - but had to hide that complexity from the end user.

In those cases, I’d get the architects to take ownership of the high-risk attribute of the project, and integrate it with the “regular’ development team with the minimum of friction. That meant the teams would be able to make progress on the 80% of the project that was known and understood, and the architect was responsible for the 20% that was risky and confusing.

Architects as an investment in the future

The final mental model I like concerns time horizons. Every software project is a trade-off between short-term delivery versus long-term survival. Alastair Cockburn talks about software development as a collaborative game, one of whose goals is to get to play again. It’s not enough to be fast, have lots of features, a cool UI - you have to survive so you can play again.

And this is much easier if you can ask different people to worry about different things. I’ve seen engineering leads spin their wheels for days, and reversing decisions repeatedly, because they had to worry both about “getting the thing out the door” and “make sure we can handle this future, somewhat hypthetical, use case”.

Some of my clients have solved this by asking delivery teams to focus primarily on the short-term delivery challenges, and architects on looking at the future. Concerns for the future usually involve high-level system design and overall implementation quality. But they also involve looking at the product roadmap and figuring out how to deliver key new capabilities. “Our current release only handles one payment method, but in 6 months we need to handle 5 - let’s make sure we don’t have to throw away today’s version”.

Architects and the SDLC

It’s worth discussing the role of the software development lifecycle (SDLC) in this context. A key factor in software quality and maintainability is the way the team converts requirements into running code. A single developer, pushing code straight to production may be fast, but unit tests, static code analysis, BDD-style acceptance tests, peer reviews etc. will probably improve the quality (and thus reflect an investment in the future).

I’ve seen many architects shy away from these process questions - but I believe they’re a key aspect of “Architects as an investment in the future”. They ensure that the team build the muscle memory to deliver high quality software, but also that the process is tuned to the demands of the project.

Software teams - what does “good” look like?

2021-08-22T15:44:43+00:00

What does “good” look like?

A question I hear often from clients, colleagues and friends is “what does a good software delivery team look like?”. This can feel a bit like football fans comparing players in their team with some Platonic ideal - “A decent goalkeeper would have stopped that!” - (or, indeed, their grandmother) - “My nan could have scored that goal!”. And while most fans would agree the primary purpose of a football team is to win competitions, they often talk about intangible aspects at least as much - “They have spirit”, “They play entertaining football”, “A real community club”.

Similarly, when talking about software teams, the primary purpose is pretty clear - deliver the software the organisation requires. It’s harder to measure than “winning a competition” - is your team more or less efficient than your competitor? Is your quality higher or lower?

And of course, we see https://blog.crisp.se/wp-content/uploads/2012/11/SpotifyScaling.pdf of stories about software teams in the big tech companies - as well as books like https://www.goodreads.com/en/book/show/35747076-accelerate, which clearly shows how focusing on 4 metrics allows software teams to deliver value.

But - as ever - it’s not straightforward. “Good” is a trade-off between many different dimensions. I will discuss some of these below.

When I talk about a “team” below, I am referring to a “2-pizza” size team of 6-10 people. So, in a team of 100 engineers, I’d expect to have 10-15 teams, and each would hit most if not all of the characteristics below.

The dimensions

Throughput

Throughput - the amount of software the team can deliver in a given time - is one of the biggest pain points for software teams. There is no consistent way to measure this dimension - “story points per sprint” is intended to help the team improve the accuracy of estimating, not as an objective evaluation criterion. I’ve seen teams “optimize” this metric simply by estimating more points for a given piece of work.

Time to value beats throughput

Often, when I ask business stakeholders about their perception of throughput, they say things like “We prioritized a straightforward change, and it took 6 months to be delivered”. The pedant in me immediatly jumps to “that’s not throughput, that’s flow”. The other common complaint is something like “we’ve been working on this big project for ages now, and it feels like it will never be finished”. That’s much more about “time to value” and “alignment” than about the raw throughput of the team.

Good throughput

A “good” team delivers business value regularly, and quickly. Typically, this means new features are released every few weeks, and high priority requirements flow to production in weeks, rather than months.

Predictability

A closely related complaint is “we can’t tell when things will be done. Everything seems to take much longer than we expect, and we don’t understand why”. This comes back to one of the problems with “Agile” - most organisations need some kind of bounds on the cost versus benefit of a project. “We don’t know how long it will take” makes the cost denominator of the return on investment calculation unknowable. This becomes much more of a challenge when the team is signing up to “large” (> 6 month) commitments, or facing “large” (> 20%) variations in delivery costs. And as the organisation and its technology becomes more complex, teams become less predictable because they have to navigate dependencies at design, build and run time. “We can’t finish our work until that team finishes theirs” can damage business confidence in the team.

Good predictability

Good teams make small promises, isolate themselves from dependencies, and build a steady “release to production” routine. Nearly every “good” team I’ve seen in action routinely delivers to production several times per month. They tend to be fairly restrained with their delivery commitments, and focus heavily on isolating themselves from dependencies.

Relationships and alignment

Software teams tend to be “different” to the rest of the organisation - even in technology companies. They tend to have a slightly different culture - I worked with a software team in a finance company where the developers wore hoodies, and everyone else wore a suit. They often have different jargon, and different peers. It’s not uncommon for the software team to have career privileges the rest of the organisation doesn’t - salary levels, for instance, may be higher. It’s also not uncommon for software teams to become a little isolated, or to think that every problem has a software solution. Individuals may appear arrogant, and people may start to focus on the behaviours of the team, rather than the outcomes - in one case, I had a client whose best engineer was put on a disciplinary program because they were routinely 20 minutes late into the office.

Another problem is that development teams may drift away from the overall organisation’s priorities. In one case, I saw a development team in a struggling retail company embark on a large-scale, strategic re-architecture programme, while the rest of the business was cutting costs, and desperately looking for ways to survive. The re-architecture was justified and necessary - but the timing was terrible.

In a positive example, one client had a weekly team update where they linked the previous week’s commercial performance with work done by the teams - “New feature X resulted in a 3% uplift in metric Y”.

Good alignment

Good teams maintain a link between the organisation’s priorities and their own roadmap. They can explain how they contribute to the mission, and maintain good personal relationships with key stakeholders.

Quality

The next dimension is “quality”. A “good” team delivers software with “good” quality. “Good” is relative - a pacemaker has different quality goals than a casual mobile game. But when business stakeholders complain about quality, it’s often framed as a very specific, long-running complaint. In one case, the client complained that the website “has always been slow”. Everything else was great - but the site performance was a bug bear. The team had worked on it, but not given it enough attention. In another case, the team had let a bug slip through to production which caused a small number of orders to be priced incorrectly - the team had noticed and fixed the defect within hours, but the press had got hold of it, and the business was acutely embarrassed.

Good quality

Good teams deliver good quality - but they pay special attention to the issues the business stakeholders care about most.

Time - past, present, (unknowable) future

The most important thing about “good” is that you can’t get there in one go, and that it doesn’t stay the same over time. Some of the best teams I have worked with started out…well, not quite so good. The single most important aspect of “good” is the ability to learn and improve over time.

At one client, there was an “exemplar” team. They’d been established early on, as a way of demonstrating “what good looks like”. The team lead was great - considered, experienced, pragmatic. The engineers were all very experienced, and worked well together. They released their first feature to production after just 6 weeks, and their product area had terrific feedback - and the business stakeholders loved working with them.

After a few months, though, the client said they wanted to “shake up the team”. They’d been doing great work, made many incremental improvements - but the other teams had caught up, and they’d gone from “exemplar” to “middle of the pack”.

Good teams keep improving

Sustainable, consistent improvement is hard. The good teams I have worked with have managed it - usually by focusing on making one change at a time, based on empirical feedback, and keeping that routine going over months and years.