Making releases safe and ‘boring’ at O’Reilly
Before
Recovered from bugs in 2-3 hours
After
Recover from bugs in 30 minutes
Ship code to production weeks faster
About O'Reilly Media
O’Reilly Media provides technology and business training, knowledge, and insight to help companies succeed. Its network of experts and innovators share their knowledge and expertise at O’Reilly conferences and through the company’s SaaS-based training and learning solution, O’Reilly Learning. Customers include Silicon Valley companies like Google, Amazon, Netflix, and Tesla, giants in industrial, banking, and other sectors, and millions of users across enterprise, consumer, and university channels. The company’s tech books include many best-sellers. O’Reilly is headquartered in Sebastopol, California.
Challenge
One of O’Reilly’s most popular products is the O’Reilly Learning Platform, which gives subscribers web-based access to thousands of books. In the past, O’Reilly ran the platform on a single code base and used an open-source library to perform feature flagging (toggling) for testing and deployment. When the company later decided to replace its monolith with a microservices architecture, it experienced serious issues with feature flagging.
With the open-source library, every microservice made direct API calls to O’Reilly Learning’s core code, significantly degrading performance. Toggling thus became unreliable. When a flag was toggled, it might not appear for a long time.
Moreover, creating and managing flags became unwieldy and time-consuming. For example, if O’Reilly wanted to turn off a feature across several different microservices, it couldn’t just toggle one flag and be done with it; engineers instead had to flip flags for that feature in each separate microservice.
All these challenges deterred engineers from using feature flags, thus depriving them of the benefits of the latter—namely, the ability to ship code faster and safely test features in production before a big release.
Solution
O’Reilly wanted a solution that would speed up system performance, be more flexible and reliable, and allow the use of a single switch to toggle a feature on or off in multiple microservices. O’Reilly considered building its own homegrown solution. But when the company came across LaunchDarkly, it chose it instead. With LaunchDarkly, each microservice no longer has to make calls to a central service. And flags can be set simultaneously for multiple microservices and apps.
Chris Guidry, O’Reilly’s VP of Engineering, was especially drawn to LaunchDarkly’s streaming architecture. Instead of straining O’Reilly’s infrastructure every time a flag needed to be altered, LaunchDarkly evaluated the flag on the client-side. This meant flag changes could be made in 200 milliseconds without affecting system performance in the slightest.
I could have spent years thinking about how to solve the problem we had with toggling in our microservices environment and still not come up with a solution. The moment I saw how LaunchDarkly handled flags, I knew it was the right fit for our architecture.
Chris Guidry
VP of Engineering, O'Reilly
QA, testing in production, and marketing launches
LaunchDarkly also has changed O'Reilly’s approach to testing and releases. Before, engineers would submit new features—many of which were unrelated to each other—to the quality assurance (QA) team for testing. None of these features could be deployed to production until QA had vetted and approved each and every one of them. This created a backlog. At the same time, O’Reilly would wait to put these new features in production until the day of a marketing launch (release). That is to say, code deployments were linked to feature releases. The risks of a release were quite high as a consequence.
"Releases used to be nerve-wracking," explained Guidry. "When a production issue occurred, we’d have ten people sitting on a call together, all typing code at the exact same time, trying to fix the issue. It was stressful."
LaunchDarkly enabled O’Reilly to decouple deployments from releases, which had a transformative effect.
With LaunchDarkly, our engineers can ship code whenever they want. We can test features in production well in advance of a marketing launch. And if a feature causes problems on the day of the launch, we can just turn it off with a kill switch—no rollbacks. LaunchDarkly makes our releases boring. That’s exactly what we want.
Chris Guidry
VP of Engineering, O'Reilly
Besides enabling O’Reilly’s software team to deploy faster and take the risk out of releases, LaunchDarkly also gave them more control. Product managers can use flags to turn on individual features for different classes of users, for example turning on side navigation for internal staff but not for customers, who might be confused by the feature. Engineering teams can ship code in small batches with new features turned off, and when the marketing team wants to launch and publicize a new feature or version of O’Reilly Learning, engineers can then turn on all the new features using flags.
All told, releases are smoother and safer with LaunchDarkly.
Results
O’Reilly has dramatically cut the time it takes to launch new features and versions of O’Reilly Learning to production—weeks faster than previously. Moreover, it can fix bugs and recover from problematic software changes in thirty minutes, rather than two to three hours.
O’Reilly Learning is used by both individuals and larger enterprises, and flags allow the same code base to be used by all customers. O’Reilly turns different features on and off for individual customers and enterprises, so they get versions of the service customized for them, even though the code base for all is the same. This has improved the customer experience.
LaunchDarkly straight out lets us ship code faster because we can ship it when it’s not completely finished and can ship it regardless of when a marketing launch is scheduled. Every part of LaunchDarkly helps us move more quickly.
Chris Guidry
VP of Engineering, O'Reilly
The company also uses flags to mitigate the effects of website attacks when malware attempts to log into O’Reilly Learning. When the site is under attack, engineers use a flag to turn on CAPTCHA login, which makes it difficult for bots to log into the site. With LaunchDarkly, O’Reilly is delivering software faster, reducing incidents, and recovering faster from the incidents that do arise. Engineers are building more features and deploying to production more frequently. Product managers and marketers can release features at an optimal time for the business (and customers). QA is happier too.
Doing all this, LaunchDarkly has helped O’Reilly live up to the promise it makes to its customers: to deliver the newest information about technology quickly in a world that prizes up-to-the-minute delivery.