Let’s Work Together to Make Painful Software Outages a Thing of the Past

The pace of innovation and software delivery is at an all time high, which is an exciting and encouraging development. But as our world becomes increasingly reliant on digital solutions, and those solutions become more and more interconnected, now it’s time to pause and ensure we’re doing everything we can as an industry to minimize the inherent risks of software delivery. The scope and process of how we manage software is more crucial than ever—not only for the vendors of such software, but for end users across the world and across industries. This, coupled with the increased output of code, driven in large part by innovation in AI-assisted development, means the stakes are higher and the risks greater for technology providers than ever before.

Let's face it: bugs are inevitable. From the early days of coding on punch cards to today's much more advanced workflows, developers have always had to face unexpected issues. All of us can empathize with the frustration and embarrassment of an outage or buggy software. But while bugs are inevitable, the disruptions they can cause don't have to be.

It’s estimated that a significant portion of outages are internally generated—not third-party-based, infrastructure-driven, or from other external forces. This highlights the importance of internal processes and tools to help minimize these risks. At LaunchDarkly, we believe that together, we can help prevent these disruptions to your business and ensure a smoother software delivery experience for your engineering teams, here's how to start:

Progressive Rollouts: Gradually introducing new features enables controlled exposure and real-time impact assessment. Instead of deploying new features to all users simultaneously, start with a small segment—perhaps 1%, then increase to 5%, and 10%—moving to the next group only when you're confident in the stability of your code. This phased approach helps contain any potential disruptions, ensuring that small updates don’t become big problems.
Automated Monitoring and Instant Rollbacks: Similarly, continuous monitoring of feature performance allows for early detection of issues, and the capability to revert to previous states ensures that services and applications remain reliable. Instant rollback to a previous version is critical, ideally 200 milliseconds or less. This quick response is crucial for maintaining continuity and performance, especially during business critical periods. According to a recent third-party survey, 86% of LaunchDarkly customers recover from such software incidents within a day or less—that’s where you want to be.
Runtime Configuration Management: Sometimes, swift adjustments are necessary inside a production environment, even for minor changes. By incorporating clear demarcations, or flags, in your code, you can quickly toggle features on or off. This ability to modify settings without deploying new code provides the flexibility needed to respond rapidly to unforeseen challenges or shifts in a live environment, while maintaining the reliability your customers need. LaunchDarkly customers reported in a recent third-party survey nearly 2x fewer user complaints than non-customers, in part due to the reliability our platform is able to provide.
Targeted Segments: The best software teams in the world are able to target experiences and rollouts to fit specific devices, regions, customer groups, and operating systems. By tailoring features based on various parameters, these teams achieve a more personalized approach, enhancing the user experience while simultaneously reducing broader risks. This strategic targeting ensures that updates are rolled out in a controlled and secure manner, optimizing both functionality and security for different user segments.

For C-suite executives and technology leaders, the focus on advanced release management and software stability is more than a technical necessity—it is a business imperative. Creating a resilient framework that not just supports but boosts innovation and maintains operational stability is crucial. One of the worst outcomes that can come from a large outage is engineering teams becoming innovation-averse because they are afraid. It’s imperative that bugs not become the enemy of innovation, and they don’t have to be!

Having the right tools and practices to not only improve but also de-risk our software delivery process is absolutely essential. As we see an exponential increase in the amount of code being shipped, and as software vendors become ever more interconnected, the need to prevent disruptive outages has become more pressing than ever. Major global and national events are looming, and for our retail partners, the busiest shopping season is just around the corner. And at the same time, AI is speeding ahead, pushing the boundaries of what's possible. In these moments, being confident in how new software is being delivered is not just beneficial—it's mission critical.

If you’re not confident in your release process or want to learn more, we have a team of experts here to talk.

Like what you read?

Get a demo

Dan Rogers

CEO

Let’s Work Together to Make Painful Software Outages a Thing of the Past

Sign up for our newsletter

Like what you read?

Related Content