In competitive markets like yours, having a separate quality assurance and operations departments from your development team is not really acceptable if you want to rapidly serve your clients and constantly fulfil their demands. And yet, although most of developers complain about bureaucracy in their organizations, when they are given a chance to collaborate quality assurance and operations specialists in their owns teams, they are still afraid of doing non-supervised production deployments on their own. This brings us to techniques and measures we should put in place in order to ensure safer production deployments.
To enable smooth and continuous flow, the secret sauce of most successful DevOps organizations is frequent deployments in small batch sizes of changes in their production systems. Therefore, everyone in your DevOps teams can assess and understand changes, and fix them when necessary.
Building an automated deployment pipeline is not fully sufficient. You need to integrate operational telemetry into your deployment pipeline to quickly get feedback about the results of your changes in production and pre-production environments. Furthermore, in your organization you need to create a common cultural understanding about: Everyone in your DevOps team is responsible for the health and successful continuity of entire value stream and deployment pipeline.
You never consider a change and deployment marked as “done” until you prove that it operates what it was designed and coded for. After your deployments you closely monitor metrics of changed modules, newly created metrics if any and the metrics of other components in your system which may be impacted from your change.
Although you use your pre-production environments to run automated tests and you monitor your system under test with your telemetry infrastructure, there will be still issues in your production systems. You can’t prevent all problems from happening, but you can be very well prepared to rectify them when they happen. If a change breaks your deployment pipeline, you bring all subject matter experts required to undo the problem and make your deployment pipeline healthy again. Following are three of frequently used methods to solve issues:
Rotate your people in your DevOps team to handle responsibilities of operations teams, so they handle and deal with operational incidents. In this way everyone in your value stream wins a sense of challenges and responsibilities of downstream work centers. Put your developers, testers, architects, designers, managers and directors on operational non-scheduled duties, so they get incident alert calls at 3am in the morning. This makes everyone in your value stream to build a solid opinion about the consequences of decisions they are giving during their daily jobs.
Such a rotation encourages operations specialists not to feel isolated and alone. Everyone in your DevOps team supports to build a proper balance between fixing production incidents, reducing technical debt and developing new features. It is quite clear that when you wake up architects and developers at 3am in the morning, incidents will be fixed faster than ever.
When developers are asked to observe their clients while clients use their software, they have lots of aha moments to discover what they should immediately improve. This is also true when architects, designers, developers and testers internally monitor other downstream work centers in software engineering lifecycle. When they comprehend the impact of their work on downstream work centers, they gain a new angle to improve the quality of their work and fine-tune the outcomes in order to help downstream work centers perform better. Everyone in your DevOps team starts to take over non-functional operational requirements part of their daily work within their backlogs. And this is only possible by enabling quick and continuous feedback loops within your DevOps organization.
It is very difficult to transfer learning experiences from real production systems to development teams. Therefore, some prominent DevOps organizations including Google make their development teams be responsible for operations of software during and after initial product launches. In this developer managed state of a product, operations engineers act as consultants. After it is proven that the product is stable enough in production for about 6 months, it is handed off to operations teams. This hand off can only happen if the product in production already fulfils a number of checks such as past and ongoing defects, telemetry coverage, out of work hour incidents, loosely coupled architectural design and change and deployment safety.
If the product in operations managed state ended up having uncovered significant design and coding issues, it can be handed off back to developers. In developer managed stage, developers are in charge of stabilizing software whereas operations engineers act as consultants.
In this chapter, various techniques to ensure successful and safer flow of deployment pipelines in your DevOps organization are covered.
These techniques demonstrate exemplary mutual respect and collaboration between developers and operations engineers in your DevOps teams.