What Apollo 11 teaches software project deployments

In this 50th anniversary year of the first moon landing, everything seems to be talking about that achievement and those first steps. Including the BBC World Service’s 13 Minutes to the Moon podcast, which goes into some detail into the final moments as the crew Neil Armstrong and Buzz Aldrin separate from their command module, manned by Michael Collins, through powered descent and onto the surface. Using interviews, archive footage and those all important communication loops, the podcast goes into some detail into the trials of those final moments, including fuel shortages, missed landing sites, computer overloads and communication failures.

At each of those steps, mission control had two options: Go, or No Go.

  • Fuel shortages were known, calmly acknowledged and communication optimised by removing noise from the communication loop except the essential fuel status reports.  They continued, basing their Go/No Go on a stopwatch which guestimated remaining fuel. Response: Optimised communication and a secondary control when the primary control appeared unreliable
  • The landing site was overshot because the horizontal progress of the landing module was too fast, Neil Armstrong took manual control to ensure that even with the guidance computer, a safe landing site could be used. Response: Manual oversight by a human being, who’s often better able to effectively respond to unfamiliar circumstances (though after significant training/experience).
  • The guidance computer repeatedly had “1201” and “1202” alarms mid-descent. These were raised automatically by the guidance system because the computer was unable to schedule/execute fast enough. Through rapid analysis, mission control quickly identified their cause and allowed the landing to continue despite experiencing 4 of these. Response: Pre-planned response*, better to use time before the mission that during it.
  • Communication was patchy. On top of the voice control was the all important data feed which sent essential telemetry back to Mission Control. No telemetry meant not enough data to safely continue the mission. However, whilst each Flight Controller was given the option to raise a No Go and abort the landing, each had enough confidence in what they did have that the landing was allowed to continue. Response: Intuitive confidence in known systems based on expertise and assessment of risk.

* Technically, these alarms weren’t pre-planned for, but others were. In this case, the 1202/1202 alarms were quickly analysed and a response returned.

Each example of failure here was failed forward. Risks were analysed and accepted. The risk of aborting was deemed higher than continuing. If they had to abort, although they’d practised and simulated, that scenario could never be robustly tested. Instead, they chose to acknowledge the variables they did know and work with those, mitigating if required.

Apply that to an IT project. Whilst I’ve never been involved in such a dramatic “Go/No Go” scenario, nerves can be high and failure can be damaging. When things don’t look right, the same call still needs to be made: do we rollback?

No.

In almost every situation I’ve been involved in, the risk of rolling back has been greater than “fixing forward”, that is, thinking on your feet and rapidly mitigating and fixing issues as the business operates. Past a certain point, too much has changed and you risk the real possibility of losing what has happened between deployment and the “Go/No Go” decision.

Fix it forward in the short term

Whilst one should still answer (almost humour) requests with “yes, we have a back out plan” to settle management nerves, another question that should be asked about a potential failed deployment would be “have you got a fail forward plan?”. What are the most likely outcomes of the deployment that may have failed? How will you spot them? Will you spot them quickly enough? Is there resource allocated to monitor post-deployment status and react outside of Business as Usual (BAU)? Is there a strong communication channel between users and potential problem solvers?

After deployment, a retrospective would allow not only the deployment to be analysed, but also the responses to the deployment, successful or not. A team can learn from the before, during and after in this retrospective to increase the likelihood of success in future deployments – including the ability to fail forward, accepting, managing and mitigating risk as you go.

This fail forward approach would probably strike fear in any project manager, though this approach can fit within an agile project very well, if the risk is accepted and handled. Whilst it would be ideal to test every possible scenario before a deployment, even within an agile framework, you cannot predict everything. As such a managed/phased roll out could slowly increase the risk, with rapid fixes and response plans ready in the wings if required. Each release is risky, but smaller change deployments can be smaller risks than rolling back a large change.

But, the people who can solve problems fast aren’t necessarily close enough to the front line to be able to respond to failure. Indeed, some standards mandate that developers should not be allowed anywhere near a production – or even a test – environment.

Continuous deployments in the long term

Of course, fixing forward should never become a standard element of a deployment task. This post is about recognising its role in still being able to deliver value in a failing deployment by recognising, accepting and managing risk. But this has to be balanced with an increasingly regulated commercial environment, requiring teams to accept restrictions and requirements imposed by the likes of the US’ FFIEC, Surbanes Oxley or the EU’s GDPR. Technology, project methodologies, workflows and processes are increasingly able to provide the opportunity to provide a greater level of comfort before a roll out, commonly known as “DevOps”. Using Containers helps manage the test and deployment environments and configurations, Continuous Integration separates developers from test platforms and Test-first development patterns help identify failure before it gets into source control. All of these aspects point to an Agile approach to software development and project delivery.

If an environment can provide not only the usual technical requirements:

  • Test-first development
  • Continuous integration from release branches, proven by testing
  • Automated deployments into environments, eg. test, pro-production, production

… but also the essential cultural requirements:

  • strong communication and trust between developers and administrators
  • emergency workflows/pathways that permit exceptional project responses, circumventing change controls in order to expedite responses
  • acceptance of management and trust in the first-responders – often developers

… then one could shoot for the moon – or rapidly built, automatically test proven, integrated code deployed into a production environment.

Introducing Agile to the corporate dark matter development team

Photograph of team with joining handsThe challenges of the Agile team within the traditionally-managed and hierarchically structured organization are discussed extensively in research. Artefacts of the burgeoning, hierarchical organisation such as company policy (“we don’t do it that way here”), imposition of arbitrary and incompatible standards, command-control management and poor project manager integration all contribute to inhibiting Agile operationalisation (1, 2, 3).

Alongside this research, I have come across some of these challenges, along with a few others surrounding the personal nature of the team. As a new member of a team, and having been indoctrinated by a fine Scrum trainer, it would be easy for me to lose friends and disenfranchise people had I preached to my new team about how they were doing it wrong and how Scrum would fix all.

But this is not about rehashing problems. Instead, I’ve compiled some lessons which may help others.

In introducing these, I could conclude right now with … Take it slowly

Understand why the team do what they do

It would be ignorant and offensive to assume that archaic processes continued to be followed due to lack of awareness or skill. There is likely to be a valid reason for filling out release forms, why users don’t get engaged or solutions are dictated before the problem is understood. Taking some time to observe and understand through casual discussion is ideal for encouraging titbits of information to be revealed which could explain the history of internal practices and procedures. Perhaps the reason is long forgotten or no longer relevant. Judgement should be withheld, lest you be judged yourself.

Prepare your tools

Creating a new process is made much easier if there are tools surrounding the process that allow the team to fall into the pit of success. I’ve used JIRA in these situations, which has been a first step in formalising tasks from which a backlog may be created. As requirements come in, the task of distilling tasks into User Stories (using the spirit of the terminology if not the definition) and then sharing them out to the team without the dogma of story points, retrospectives or stand-ups has allowed a degree of task ownership and collaboration to emerge over the medium of the tool. JIRA has an API, so if the wider organisation insist on their legacy tool to be used, it can be integrated within a wider process.

Embrace, Extend …

Stopping short of the entire terminology reputedly used internally by Microsoft, I’ve found that it is the team that determines the success of Agile, not the method. Therefore, I’ve embraced the Agile dogma such as Scrum, and extended it – or rather – evolved it, according to the personal nature of the team. During forming, storming and norming, personalities within the team are explored within a professional context, allowing me to learn a lot about the individuals and their motivations. Attempting to mandate process according to Scrum-doctrine will only serve to extinguish Agile all the more quickly and perhaps professional relationships within the team.

Know the assumptions of your method

Methods such as Scrum make sweeping generalisations across industries and fail to state their assumptions. Outside of the software development company, multi-project developers who have to balance ongoing support requirements are widespread. Scrum, and the admittedly useful and revealing metrics that fall out of it, assume that 100% of the time of developers is spent on the code. It’s all about the Story Points, NUTs or even breeds of dog (“that will be 2 Labradors and a German Shephard”) but this does not take into account time spent outside of the project. At any given moment, the **** could hit the fan robbing the project of velocity which would be unaccounted for in the retrospective reports. Equally, how well does Scrum work within a portfolio? Models like SAFe claim to allow multiple projects managed using Scrum-like techniques, but this is not necessarily the same contended resources.

 

The key to the success that I’ve seen with this is to introduce Agile principles gently. No great switch was needed from which all projects would be Agile. Each project slowly became more Agile than the last and the surrounding individuals (managers, Business Analysts, etc.) are understood first and introduced second. The Business Analyst is perhaps your greatest chance of success of implementing Agile concepts within the hierarchical organisation so they need to be brought on-board at their own pace. I’ve learned to listen and understand first, based on which, one can start to make the case using what can be highly adaptive models that are made to fit the team, not vice-versa.

 

Keeping projects warm in multiple project teams

Whilst the project management gurus and Agilistas talk about methods such as Scrum as if developers only sit on a single project, the reality is that they don’t. Developers are charged with concurrent projects, each with their own constraints which require autonomous scheduling of development effort according to resource availability or even what is considered easier to focus on at the time. On top of this, the inevitable support obligations of applications previously delivered will undoubtedly pick away at development time. Sure, this is not an ideal situation and is known to be an inhibitor of Agile, but it is often reality for those developers not working within sexy start-ups with a single product or the larger software houses.

Within this multiple project, support intrusion reality, it is easy to fall in to the trap of starting projects with much enthusiasm but then seeing the initial sprints turning into a drudge as developers fight to keep the project and their own enthusiasm alive amidst the chaos of a demanding multi-project portfolio. Priorities change, resources are re-allocated and interest wanes across the business.

My suggestion in this circumstance is to keep project iterations small, and keep them warm.

Consider this example team, with their own project portfolio.

Agile-Lean Portfolio

Agile-Lean team Portfolio (click to expand)

The projects are spread across the team members with multiple developers able to work on the same project. Projects are worked on in fixed timeboxes, which allows developers to apply Scrum-style working to generate velocity metrics that help improve predictions of future performance.

Longer projects cause developers to “go dark” from the team and the client. They are less visible and accessible to the client and become less available within the team’s portfolio of projects without sacrificing their existing project effort and therefore attention. Instead, breaking the projects into smaller iterations and spreading across the team means they can be switched in and out predictably and with minimal friction.

Johanna Rothman suggests using a “Parking Lot” to hold projects in a warm state in between sprints. This provides an ideal opportunity for users to become involved and fulfil their obligations as part of an Agile project and provides the client with a predictable obligation within an existing workload as part of their day-to-day business.

Developers can be switched in and out of projects, allowing for cross-skilling and redundancy for support obligations. In the three-developer team illustrated, each project has a minimum of two developers who have worked on the project and who would be able to support it.

The team lead will remember when their team has booked their holidays, right? Adding them to a project plan helps visualise the team member’s absence ahead of time, allowing developers to collaborate beforehand about possible risks or support obligations and applies redundancy.

The method (Scrum, XP, Kanban, etc.) is unimportant. I’m not a consultant selling ideology, this example is based on experience of working on multiple software teams.

  • Balance between developer focus and availability to the portfolio. Developers work best with focus on the task, but must be available to the portfolio often at short notice. Allowing a project to be parked facilitates selection of the most-able developer in their own individual schedules within their own timeboxes to attend to an unexpected task. Equally, having fixed timeboxes allows the business or client to be promised attention by a point in time, backed up by a known and published schedule.
  • Maintains sharp focus for developers who must develop within shorter iterations. Shorter timeboxes bring sooner objectives and deliverables. Tighter management of developer effort, particularly in larger projects, will help reduce drift in terms of scope and attention. This also reminds the business/client of their own role on an ongoing basis, with regular status updates and predictable resource requirement.
  • Less cost for developer to rediscover context. Distractions cost time which cost money, though this is rarely quantified. The reality is often that a developer will need to context switch from project-to-support-to-project-to-chat-to-project. But this can be reduced and controlled. By limiting the demands on a developer’s contextual concentration within a period of time (timebox) to a single project, one can maintain a developer’s attention on the day-to-day support and collaboration requirement but maximise attention per project. Perhaps individuals may be designated “hot” for support tasks within a timebox on a rotating basis to minimise team-wide disruption.
  • Keeps the user engaged. The role of the customer is often misunderstood and under-represented in Agile. Agile approaches are dependent on early and regular contact with the customer to gather feedback on project deliverables thus far. However, the client and wider organisation already have a job to do which the development team must respect. Keeping the cadence of user involvement to a regular and predictable timetable helps the user plan their own time within their existing obligations. It may be unsuitable for the user to be expected to contribute to projects at the end of every sprint, but regular contact will keep the user engaged, particularly if they can see improvements.

There are frameworks and whitepapers and methods and processes and the rest, but these don’t interest me. What works for the team is what works for the business. However, perhaps this structure may be integrated into your processes, formal or not. The use of timeboxes enables integration into a wider process, such as the SAFe Framework, where individuals and teams can be “plugged in” to the wider project as required.

 

To the War-room!

I’m currently studying for an MSc in Project Management. This is making my head pop at times so I’ll blog the bits that don’t make it into my academic submissions from time to time.

Man in front of a wall of writingIt strikes me that in the various projects I’ve worked on, I’ve found myself struggling to move between projects at a moment’s notice, flipping my consciousness in the process. In my head, I am mentally trying to compartmentalise my project work to ensure I don’t get confused as a result of any “leakage”. Meanwhile, my desk gets messier.

What if we could reflect this set of mental compartments in the real world, in the office? By separating project activities from each other in the office, it might just make it easier to flip between projects. Robert Wysocki mentions the “War Room”, which is a room dedicated to the project. This room will probably just be a meeting room “commandeered” by the project team for their collaboration and requires little more than usual office stationery and equipment during the course of the project.

The War Room should contain:

  • A whiteboard
  • A computer and projector
  • Ample water
  • Flipchart
  • Plenty of wallspace and blu-tac

The room is the “meeting point” for the project team both as part of formal meeting times and collaboration times, perhaps as a way to get away from the usual team and concentrate on the job in hand without distractions. The act of removing yourself from your usual position in the office will be an immediate benefit to reducing distractions and when you’re headed to the project War Room, it’s clear to your colleagues what you are working on.

It might be messy, with scrawling across the whiteboards, papers hanging from the wall, textbooks left open and memos littering the desks. It is however a workspace, dedicated to a particular purpose. When individuals enter that room, they join the project either as a collaborative member, a manager or an observer. It’s a physical boundary between the hum-drum taking-care-of-business work and transformative, collaborative work.

Of course, not every office is able to facilitate such luxuries. It might be due to physical constraints (not enough rooms/space) or political (“why should they get their own room?”). Unfortunately, the argument against productivity and office design has long since been lost and we’re doomed to cubicles spread across noisy, windowless offices so making the case for a dedicated collaborative space is going to be difficult.

Then again, if the business can’t give you a dedicated project collaboration space, what value do they really have on the project?