TrueCar deploys 20 times a day, migrates 500 websites to AWS Cloud.
Deploy 20 times/day, up from once/week
Completed cloud migration with zero downtime due to infrastructure issues
TrueCar is a digital automotive marketplace dedicated to being the most transparent brand in the industry. Consumers receive upfront pricing information when they connect with one of TrueCar’s 16,000 Certified Dealers, creating a more confident buying experience. TrueCar operates its own site and powers car-buying programs for over 300 companies (partners), including some of the most trusted brands in the world.
TrueCar is a digital automotive marketplace that provides pricing transparency and enables consumers to engage with and receive offers from TrueCar Certified Dealers. Nearly half of all new and used car buyers in the U.S. engage with the TrueCar auto-buying programs in some way during the purchasing process. This online marketplace includes not only the company's flagship website and mobile apps but also 500 auto-buying programs on affiliate sites. And TrueCar's engineers and product managers – roughly 300 people – are responsible for operating all of it. Given the magnitude, they’ve felt uneasy when introducing changes to this ecosystem in the past. Regis Wilson, Site Reliability Engineer (SRE) at the company, captured the essence of this fear when he said, "If TrueCar's site goes down, there's no more TrueCar." To mitigate risk, engineers took a Waterfall approach to software delivery. They'd work painstakingly to make features perfect before unveiling them to the world – typically a three-month process. These long development cycles were, in large part, a result of code deployments being bound to code releases. As soon as a feature was deployed to the codebase, it immediately became available to every site visitor across many domains. Thus, it was critical to get a feature right the first time. This was all before LaunchDarkly.
LaunchDarkly’s feature management platform gave TrueCar an easy way to create and manage feature flags – and to do so on a large scale across a wide range of complex use cases. In other words, LaunchDarkly enabled TrueCar to "dark launch." This transformed the company's development practices. Now, TrueCar: * Tests new functionality in production * Performs targeted rollouts to discrete segments (i.e., websites) * Does progressive releases and percentage rollouts * Turns off defective features without having to redeploy * Gives product managers control over features * Runs A/B tests to improve feature quality TrueCar can now measure a feature’s impact on system performance and user engagement early and often. With progressive releases, teams can roll a feature out to a small audience, test it, disable it, tweak it, push it back out, and so on until the feature reaches its peak performance. Then, and only then, will they release it to the whole ecosystem? Or, thanks to targeted rollouts, they can choose to only reveal a feature to designated partners. This prevents scenarios where features get activated on the wrong affiliate sites. It also reduces the load on TrueCar's infrastructure.
The ability to delegate feature control to product managers has also had a profound impact. First, it has allowed product managers to run far more A/B tests and beta tests, thus elevating the quality of new functionality. Secondly, it has freed up engineering resources. "Now that we're doing feature flag delegation with LaunchDarkly, product managers are driving the spaceship, and we are working on the engine," said Regis. "That's how it should be. Product managers have more contextual knowledge than I do about the business implications of a given feature. So it makes more sense for them to manage the rollouts for those features. In the past, a PM might ask me to release a feature to 10% of XYZ user group, and I'd be left wondering – release which feature to whom, how, when? But today, PMs can manage that whole process themselves. It's a much better system." The improvements to the release process constituted one piece of a much larger effort to transform the web experience. As a part of that broader context, TrueCar planned to make another major change: migrate its website infrastructure to the cloud. The company chose Amazon Web Services (AWS) as its cloud infrastructure as a service (IaaS) provider. Regis had shouldered the burden of an AWS migration several times before at other companies. But there was a key difference between then and now: in past migrations, he’d never had the luxury of regulating web traffic with feature flags. LaunchDarkly changed that.
For TrueCar’s migration, Regis had to move existing data from the company's on-premises data center to AWS. Doing this safely required him to funnel data in small increments, searching for errors throughout. Perhaps more daunting, Regis had to figure out what to do with new web requests. He couldn’t, for example, just start routing all new web traffic to the cloud. The risks were too high. But the solution wasn’t as simple as channeling, say, 10% of all traffic to AWS and 90% to the legacy infrastructure. Web requests across TrueCar's 500 sites were split into different categories. Thus, Regis needed to route traffic not only by percentage but also by traffic type. That is, he needed to selectively switch traffic and applications between the legacy and new platforms. The scale and complexity of this challenge were great. Thankfully, Regis discovered he could apply the same LaunchDarkly capabilities used for release management to traffic routing. Here is how the TrueCar team did it. They adopted CloudFront, Amazon's content delivery network (CDN) solution, to spread the computing workload for web requests across a distributed network of servers. This would speed up load times and improve site performance. Regis set up a rules routing engine with Lambda@Edge functions, a core feature of CloudFront. He then integrated Lambda@Edge with LaunchDarkly’s Node.js software development kit (SDK) to tie feature flags to these rules. LaunchDarkly's clean and simple user interface made it easy to manage the vast sprawl of web traffic flows. Regis' team used targeted rollouts in LaunchDarkly to route distinct web requests (e.g., site searches for new cars vs. used ones) to the correct endpoints, while, at the same time, routing that traffic by percentage. In other cases, they would randomly generate numbers and divide traffic by weight. Beyond supporting these complex routing rules, LaunchDarky also allowed Regis to give product managers control over the rules within their scope (e.g., web requests for high-profile partners). This spread the burden of the migration across more stakeholders in the organization. Regis explained: "With LaunchDarkly, business users and application software engineers could use the same A/B testing framework that they’re used to manage traffic flows across the whole infrastructure as well as within the application itself. For that and many other reasons, this cloud migration was the smoothest, most uneventful of any I've ever managed."
Since using LaunchDarkly for releases, TrueCar has gone from deploying features once a week to 20 times a day. It's an astonishing shift. With Waterfall in the rearview mirror, the company is now doing continuous integration and continuous deployments (CI/CD). LaunchDarkly has also reduced the time engineers must spend managing features for the product team. SREs can instead devote time to making the infrastructure more stable, performant, and reliable. Meanwhile, product managers can run their own beta tests, iterate on features more extensively, and release in a progressive manner. Finally, despite the high stakes and remarkable complexity, TrueCar migrated its entire website infrastructure without any major issues. Perhaps even more stunning, product managers – not engineers – managed a significant portion of the project. The TrueCar ecosystem is thriving. The customer experience is better than it's ever been. And according to Joshua Go, TrueCar's Senior Director of Data Platform, LaunchDarkly "has played a huge role in driving those outcomes."
"Business users and application software engineers used LaunchDarkly's A/B testing framework to manage traffic flows across the whole infrastructure. For that and many other reasons, TrueCar's AWS migration was the smoothest, most uneventful of any I've ever managed."
Site Reliability Engineer (SRE), TrueCar