How LaunchDarkly guards releases to build trust and mitigate risk featured image

Before

  • Manually correlating flag changes with latency and errors
  • Engineering time and resources required to huddle and resolve incidents related to releases
  • Limited visibility into the impact of feature releases on frontend and backend systems
  • Rollbacks were reactive, requiring real-time investigation while risking a degraded user experience and support tickets

After

  • Specific request-based and member-based metrics track errors directly linked to flagged features
  • Engineers rely on automated rollbacks triggered by real-time error thresholds
  • Statistical analysis embedded in guarded releases provides actionable insight between flags and errors
  • Engineers can focus on new projects knowing that release monitoring is happening in the background

About the LaunchDarkly Fundamentals Engineering Team

At LaunchDarkly, our success depends on close coordination with product engineering teams and managers to understand their timelines and challenges, allowing us to anticipate shared needs and build supportive features ahead of demand. For us, success means product engineering teams have fully adopted and integrated the capability, making it an essential part of the LaunchDarkly product engineering ecosystem.

To help support widespread adoption, the Fundamentals Engineering Team at LaunchDarkly builds and refines a shared platform of core capabilities across internal product engineering teams. They work closely with product engineers to develop impactful features and products faster, which benefit internal teams and our customers alike.

Challenge

At LaunchDarkly, we face the same challenges as many of our customers who are moving from legacy codebases to more powerful and intricate architecture. New features and deployments, even wrapped in feature flags, still present some risk. Our customers expect our platform to be reliable and fast, but sometimes changes can have an unexpected and negative impact on UI components and backend processes. The real sticking point is quickly understanding which flag broke a feature, or, even better, halting a bad release before it gets into the hands of more customers.

“Previously we could theoretically track the effect of flag changes by copying flag values to spans in our tracing tools. In practice that wasn't done because it was byzantine and not discoverable, so we generally weren't able to correlate flags with latencies and errors directly, we'd more often get paged and then see which flags were recently changed.” – Mike Zorn, Engineer, Fundamentals Engineering

This manual correlation process relies on getting in touch with engineers who recently deployed flags, racing against the clock to diagnose which one was the problem, and then finally turning it off.

Solution

Using the Guarded Release functionality in LaunchDarkly, Mike wanted to create a process that’s repeatable, covers the common ways the company releases software, and gives teams the tools they need to guard every release.

“Initially, we wanted to have per-route metrics and make other measurements, but (looking at) errors is a great place to start and easy to scale to multiple teams.” – Mike Zorn

Using a combination of request-based and member-based metrics, Mike’s Guarded Release recipe allows teams to monitor specific metrics, such as HTTP request errors and frontend errors across different severity levels. This new process enables contextual rollouts by assessing errors specific to flagged feature users and performing a statistical analysis to detect any correlation between the new feature and the errors. 

Based on the data it returns, the Guarded Releases feature either automates the rollback process or proactively sends an alert when increased error rates are detected, giving everyone visibility to what happened and why.

Results

Guarded Releases are now mandatory at LaunchDarkly, since we want to not only build safety into our release process, but also use what we learn to improve the product and experience for our customers.

"Guarded Releases gives us the peace of mind that almost all instability caused by a new feature has minimal impact. It provides us with confidence that no feature should expose our customers to new errors, and if it does, the feature will be rolled back before support tickets come in." – Sonesh Surana, SVP, Engineering

It didn’t take long for the new process to prove itself. In one instance, Lexi Ross, Senior Engineering Manager, saw that a Guarded Release automatically rolled back due to an error rate limit after it impacted five customers.

“It’s pretty cool that it was that sensitive. If we hadn’t used a Guarded Release, the bug might have impacted hundreds of customers before we caught it, preventing them from using the targeting page.” – Lexi Ross, Senior Engineering Manager

In another example, a Guarded release for a code reference change caused some self-service users to encounter a 500 Internal Server Error when trying to log in. Within 20 minutes of receiving the first customer ticket, the team identified the flag causing the error and rolled it back. Seeing that the issue had been resolved, a customer reported back to the Support team, “WOW! That was so fast! I'm very impressed. Thank you!”

At LaunchDarkly, building exceptional customer relationships and experiences is one of our fundamental operating principles. We’ve built, tested, and iterated on Guarded Releases to help give you visibility into how your software changes impact end users. By sharing our success stories, we hope to provide a roadmap for you to continue pushing the pace of innovation and improving the experience of your customers and your engineering and development teams.

“Bad releases happen, and the real pain of recovering from bugs today involves hours of jumping between multiple tools and pulling engineers away from other projects to figure out what is to blame. We don’t want precious engineering resources wasted watching and waiting with their hands on the ‘recover’ button. Implementing Guarded Releases across all teams lets everyone get back to the good work of building new products and improving the user experience.” – Claire Vo, Chief Product and Technology Officer

Like what you read?
Get a demo
Related Content

More about Product Updates

November 15, 2024