Testing in Production on a Distributed Team

Use feature flags to safely test software changes from anywhere, in any time zone.

In an effort to connect with our community despite being physically isolated, we decided to take LaunchDarkly's Test in Production Meetup to Twitch. On March 27, Heidi Waterhouse, a Developer Advocate at LaunchDarkly, inducted our first-ever live stream of the event with a talk befitting the times: how to safely test, deploy, and release changes when your entire software organization is remote.

Heidi reminded us that, no matter how hard we try, it's virtually impossible to predict the behavior of a feature in production—which is why testing in production is so important. Thankfully, developers can use feature flags to safely deploy code without exposing it to users. Once it's clear that the code hasn't broken anything, those in Product or Marketing can release the code at their leisure without needing developer intervention. And this can all be done quite easily from anywhere.

Watch Heidi's full talk to learn how to reduce risk when testing in production, feature flag best practices, and general tips for working remotely.

Join our next Test in Production Meetup on Twitch. If you'd like to see how LaunchDarkly can help your remote software org safely deploy changes to production faster, click here.

“Testing isn’t a point. It’s not a single moment in time. Testing is a state. It is the state of being reasonably sure that your software is running and working as you predicted.” —Heidi Waterhouse (@wiredferret), Developer Advocate, LaunchDarkly

FULL TRANSCRIPT:

Heidi Waterhouse, Developer Advocate, LaunchDarkly:

All right. This is asynchronous testing on a distributed team because for all of us, our desk chair doesn't roll far enough to get across the country to talk to our teammates, so how are we going to do testing in a world where we're distributed if we haven't already set up that system? Testing feels really risky to us. Testing is one of the things that we worry about because literally testing is breaking. If you run a test and it hasn't broken at some point, then do you really know if the test is testing the right things? I think this is the concept behind test-driven development that we write a test that breaks unless the code passes and then we write the code to pass the test. Most of us don't do that extremely rigorously, but it's still sort of something that's in our understanding of testing is that there should be a breakage point.

Testing is also breaking because we're introducing changes. We theoretically have a stable running state and then all of a sudden we're going to introduce something that may or may not work, and if it changes the running state to something other than running, then we're in trouble. We're altering behavior. We know how something works now and testing is not the way it actually is going to work in the future. One of the things that we do to deal with the fact that we are anxious about testing is that we have a lot of process around it. We say, okay, here's what the test suite looks like. Here are the automatic tests. If we're lucky, we have a QA team that also runs a bunch of other tests to help figure out what's going on, and we have this approvals process. You can't commit until you've passed these tests. You can't release until you've passed these tests.

All of this process gives us more comfort with the fact that testing is breaking, but what if your changes, the changes that you make, those risky changes were invisible. What if you weren't taking a risk when you tested because nothing was showing to the outside. This is why we're called LaunchDarkly because we want people to be able to launch things into production without showing it to everyone. It's really, really important when we're talking about feature flags to understand that deployment is not launch, it's not release. Deployment is getting your code to the place it needs to be in production.

Launching or release is the business value of letting other people see that. When you separate those two concepts, you're going to have a much easier time understanding how it is that you can test something in production if it's not showing to anybody. When we're doing testing, the first thing we do is test locally. We make that, in the phrase of the joke, it runs on my machine. You want to make sure that it's doing the things that you expected to do in a local world controlled environment. Then we do an integration test. We commit it and we say, okay, in the process of committing this code, have I broken anybody else? Is all of the code working together in a way that seems holistically sound because if we don't have that automation, it's really easy for us to introduce a breaking change and not know where it is because a lot of us are contributing to our codebases at the same time.

The solution is to be able to test very, very quietly on production because production is so much weirder than any test environment you can come up with because production involves users, and there are a lot of genius QA people out there. People who have...the mindset for security and QA is almost exactly the same and it's like, what happens if I hit that nail with a screwdriver? What happens if, not like it should work if, but like I'm curious about whether or not I can order negative one beers in a bar. QA can catch a lot of these things, but they're never going to catch the full weirdness of a production environment, especially in a software-as-a-service world where we have a bunch of dependencies on other pieces of software, on other streams, on other providers. We have this microservice architecture that is just really impossible to replicate in a local system. We're going to have to test it on production, but we don't want to do it in a way that's a breaking change for our users in a way that's a breaking change for our systems.

Testing isn't a point. It's not a single moment in time. Testing is a state. It is the state of being reasonably sure that your software is running and working as you predicted. If we stop thinking of testing as, I ran the test and at that moment in time everything was working, and we think of the tests in the test suite as covering the state of something, as analyzing the ongoing health of something, then it's a lot easier to understand that testing could involve moving things in and out of production because the state of production is being tested, not that one commit. Let's go back and say that again. Testing isn't a point in time. It's a state. It is knowing the health of your system, and even if we've committed something, we need to be able to say it's working sort of or it's not working or it's mostly working or it's working perfectly.

Those could all be true for the same piece of code depending on the different conditions of the environment that it's in. When we tested this webcast, we tested it at night because that's when we had time, and we didn't have any problems with our bandwidth. Well, now everybody's online and we have this problem, so we did test it and at that moment it was fine, but now we've tested it again in production and we've discovered that daytime hours have bandwidth constraints that we're going to have to pay attention to. This is an addition to our known state of the world. Feature flags create a superposition in your code. I ran into the bedroom and told my wife this last night while she was trying to go to sleep and she's just like, "Ah." because I'm that kind of obnoxious person, but it's a really cool idea. When you look at quantum superposition or when you even think about the light particle versus wave experiment. What we're saying is something can be two divergent things at exactly the same moment depending on how we look at them. We're doing this slip test.

We're saying depending on how you measure it, it's either a wave coming through this lid or it's a photon coming through this slit. It's so cool. When you have feature flags, depending on whether you're in the group that is flagged to get the new change or whether you're in the other group, you're getting a different result even though the code is identical. All that's happening is it is evaluating your context and giving you a different result. So amazing. I love the idea of quantum superposition as a way to describe flags because I've been saying for a while that a feature flag is sort of like Schrodinger's code. It's on and off at the same time. When you deploy feature flagged code, you're saying, I want the ability to change the state for people who are in this group or that group. Really powerful for testing because you can use it to test acceptance. You can use it to test throughput, you can use it to test soak times.

Sometimes you really just need something to run for a while to understand whether or not you've put a memory leak in, but you don't want to do that for everybody and you can't really do it for yourself because your own testing, even if you have some kind of load testing mechanism is not going to detect that the same way that thousands of people running it on their systems is. What does that mean for you? Well, feature flags give you the ability to test your code without disrupting. You want to be able to test things without messing with the user experience. Certainly for the majority of people, but maybe not for anybody. It allows you to check otherwise unknowable facts. Like I said, you're not going to be able to replicate the weirdness of production in any kind of test environment. You're going to have to be able to test in a way that answers things that you couldn't have known on any kind of normal system that you're used to using. You're going to be able to check unknowable things more securely.

You're still not going to be able to answer unknown unknowns entirely, but it gives you more of a headstart on like, hey, something funny is happening here. Let's put some observability on that and ask questions that we didn't know to ask, which I think is really the key point of observability systems is a monitoring system allows you to get answers to questions you had. An observability system allows you to get questions to answers you didn't know you had. Finally, I want you to be able to orchestrate releases, not deployments. As developers, we frequently think of ourselves as people who are pushing code and we are, but we also need to understand that code isn't the business value, code helps provide the business value, but the business value of what we're building is actually the ability for people to do things. There's this great book by Kathy Sierra called Badass: Making Users Awesome, and she said this thing that resonates with me so much. She said, "Nobody wants to use an ATM. We want money, we want cash. The interface is just something that we have to deal with in order to accomplish what we want." I think that's true of most software. Nobody sits down and says, "You know what I'd like to do? I'd like to use Word today. That's going to be fun." No, we want to write a document. We need to get some work done.

When you remember that release is about the business value, is about the ability to get work done, it's easier to understand that testing your deployment is key but it's not actually where the value is delivered. Again, separating deployment and release, it's going to give you a lot more insight into that. Okay, so I have hypothesized for a while, and now we are going to get real. What could you do with flags for testing? You can test load with progressive delivery. If you've ever been near a database, you know that database queries are hard and it is easy to be subtly wrong in a way that's going to cause a ton of churn and traffic and database bogged down but only at scale, and you're not going to be able to tell if you've screwed that up until you've had it out at scale for a while.

If you feature flagged this new query, what you're going to be able to do is A) turn it off if it's causing a big problem, but also maybe you could deploy it to only 5% or 10% and do an analysis on whether the amount of load increase is something that you can sustain over a full rollout. Gives you a chance to predict whether you can handle that kind of load or whether you're going to need to do some elastic scaling. You also want to let new features soak. It gives you better data about how they're being adopted, and unexpected side effects, and things that you can't anticipate until it gets out in the real world, but if you're doing progressive delivery, it gives you a chance to understand, at 10%, we started having some weird spikes. Let's look at that and see if it's something we can fix. I love this, this is sort of the anti-DevOps.

Developers can throw something over the wall to Product and say, "Release it when you're ready. We're done with it. We feel confident it's been tested. You can handle it from here. You can turn it on yourselves. We don't need to have tickets come back to us that say, will you please turn on the thing?" If you have feature flags, you can hand off all of that release stuff to your product or marketing [people] or whoever is in charge of that and not have to circle back to it. All right. Here's another example. What if you want to test around the world, so you're all sitting in your home offices, maybe with small children climbing on you. Synchronicity is hard. It's hard to say, "Okay. At 12:00 Eastern, we're all going to do the thing that we need to do for release." Why? Why would we need to do that? If you have committed your part and done your testing, then you can say, "Okay, my part is ready to go. I've flagged my thing ready to go, but it's not going to go live until we get the approvals process from somebody else who says, "Okay. All of the things are ready to go. Now we're going to turn on the leader flag that says roll out the whole thing."

Consensus about whether something passes testing can be asynchronous. There is this huge problem right now where not only are we working from home, a lot of us are working from home with other commitments, and maybe 2:00 PM is a really bad time for you because it's kiddo nap time. If you're on the phone, the kid wakes up and then there's just disaster and screaming. What if you could do this without having some kind of verbal coordination or even time synchrony? We don't need that for testing. We're used to having it, but we don't really need it. There's not a thing that we must do together. Finally, one of the things that I really like is the ability to hook your flag API into your monitoring and say, look, if there's a 1000% spike, I want you to turn off that inbound feed stream. Your database is consuming data from a third party and all of a sudden there's a huge spike.

Well, your option is either to consume that huge spike and overwhelm your database and probably degrade performance for your other customers or say "something is wrong over here and I'm not talking to you until you figure it out." That gives you a lot more ability to protect your whole customer base from the rogue actions of somebody's third-party thing. It's not necessarily malicious. I think all of us are dealing with why is there this sudden traffic spike because it turns out that our backbone is not really built to have quite this much traffic on it. If we build in automated responses that can turn things off or turn them down immediately without requiring a human response, then we're giving ourselves a lot more leeway around how to make sure that things continue to work for everybody. That is my presentation. I'm super excited to take requests. We have a URL that if you follow this, you will fill in a form and get a T-shirt. I'm not going to guarantee they're going out immediately, but I promise we will keep track of them. Thank you.

Yoz Grahame:

Brilliant. Thank you so much, Heidi.

Yoz Grahame:

Excellent. Thank you so much. There were fascinating points raised in that. I mean, given the both of us work at LaunchDarkly, we're used to different aspects of learning about testing in production, learning about the different uses of flags. Having said that, I'd never thought of the superposition, the quantum superposition idea before. [Inaudible] site reliability engineer, I can just imagine. I mean, keeping SREs locked in boxes with bottles of poison is, I think what a large number of companies are basically currently doing.

Heidi Waterhouse:

I'm a little worried that's how it's working. Yeah.

Yoz Grahame:

Yeah. Being able to add certainty helps massively here. The point about the lack of point testing isn't a point, it's a state is something particularly, it's very easy to fall into that mindset of thinking that we did testing here, right? It was okay.

Heidi Waterhouse:

Yeah. It's tested, it's great. I'm like, "Well, yeah. I ate lunch but I still need dinner. It's a continuous process."

Yoz Grahame:

Right, and it's amazing how things that seem to have been tested and were fine can suddenly go south despite no deployments having happened, no configuration changes having happened. I think there's a degree to which when you are taught computing and engineering, you are raised thinking that everything is nicely deterministic. Everything works the way it says it should. Then-

Heidi Waterhouse:

That's why we have Ops people. They know better.

Yoz Grahame:

They have learned the hard way. They are the embittered, snorting with laughter types who are ready to jump into action at any moment. Watching things deteriorate almost in a kind of organic or biological way is common practice having to deal with that. Have you seen anything from how that changes QA practice in terms of thinking about it that way?

Heidi Waterhouse:

Well, I think one of the interesting things that has happened over my career is that we have fewer QA teams because we have more emphasis on developers doing their own testing and more emphasis on test suites, all of which are necessary but not sufficient. We really do need that breaker-attacker mindset, and as for the weird entropy that code that should be stable experiences. I think a lot of that has to do with developers really visualize clean inputs. I've been saying for a couple of years that we need to be preparing Unicode for all of our inputs because emoji-named children are coming and it turns out that there is an emoji-named company out there now.

Yoz Grahame:

Really?

Heidi Waterhouse:

Yeah. They have a literal Unicode. I'll have to go look it up, but yes. Is your system going to break if the fashion or style of inputs changes and have you bounded your input sufficiently to prevent that? I think that's a really common breakpoint for people.

Yoz Grahame:

Yeah. Testing the edge cases is great. Always remember a colleague of mine always used to use, there's a Unicode symbol "heavy black heart," which apparently somehow is a great tester for any tech pipeline you have or data pipeline. Somehow it manages to touch all the edges, so he'd did use it whenever testing any kind of Unicode pipeline because if there was a bug in there, "heavy black heart" would find it.

Heidi Waterhouse:

That's good to know.

Yoz Grahame:

Yeah. There are some great tools out there for doing this, especially for kind of fuzz testing or property testing. There was a great talk at PIE Tennessee, which I was at a month ago about Hypothesis, which is a Python library. There are equivalents of this that will understand what types you are testing with and then deliberately push all the edges of those types, but that again is kind of, it's great for unit testing, which is when you're testing very tightly defined bits of code that are running very nice and neatly inside one process, for testing very large multi-system integrated pipelines, that way you end up... There's nothing out there that really tests in that way that I've seen. Being able to react quickly and being ready to do that actual testing yourself and being able to wall that off is incredibly useful. The superposition aspect is actually something we get a lot of questions about as well, isn't it?

Heidi Waterhouse:

Yeah.

Yoz Grahame:

It's something people worry about how do you test for all the different combinations of the flags that you could get?

Heidi Waterhouse:

I have a-

Yoz Grahame:

What advice or questions given that... Yeah. Yeah.

Heidi Waterhouse:

I have a partial answer to that. It's not a full answer, but what I've been saying is test your current state because if you don't test your known state, you don't know if the new thing is what's breaking. Test with all your flags on, test with all your flags off. These things may break, but they should break in a predictable way, and then test the delta. If everything is stable state except this one flag, test that. It doesn't cover all of the comminatory possibilities, but it covers enough of them to give you a sort of mathematical headstart. Then if it turns out you're having a behavioral problem after that, you always have the ability to turn the flag off really fast.

Yoz Grahame:

Yeah. That's incredibly handy as well. Being able to flip things immediately when you notice there's a problem, is super useful. That's an interesting one. Testing with everything on and everything off, I can imagine it's kind of quite scary.

Heidi Waterhouse:

It's terrible. Like I said, it frequently breaks, but your broken state should still be predictable. You should know what kind broken it looks like. For example, LinkedIn had this problem a few years ago. Something happened in their flag management system such that all of their flags were on simultaneously. The ones that say, hey, register for Pro and the ones that say, hey, thank you for registering for Pro, at the same time. It was really bad. They had to roll back a long way because they didn't have a good state.

Yoz Grahame:

Right.

Heidi Waterhouse:

Yeah, because it was not actually feature flags, it was more like variables.

Yoz Grahame:

Right.

Heidi Waterhouse:

Right, but while it was broken, it was still serving a page. It wasn't the user experience that you wanted, but you could see what was happening. I'm not saying expose that to the world. You don't want to do that, but I am saying testing them all on and all off tells you exactly where the delta is. The thing I've been saying... the delta in the physics sense, the change from baseline. The thing that I've been saying about testing and flags is, it's really hard to lose 000 at penny slots. If your bet is very small, your odds of losing a lot of money are very small. It's not hard to lose 000 at 00 slots. I don't even know if those exist. You can absolutely lose 000 because you're making bigger bets. The smaller and more incremental the bet you're making with each feature flag, the less likely it is to be catastrophic. You lost a penny.

Yoz Grahame:

This is the aspect about keeping blast radius small, really, isn't it? Once you know that the cost of a bug is substantially reduced, then you feel freer to be risky about it in ways that are actually ended up being fine. Being able to do that flag flipping on and off is incredibly scary, but at the same time you're in control, right? You're watching the results live, and so you can fix it immediately if it's a problem. We actually had an interesting question here, which is we had... Well, a few different questions coming up in the chat, and also thank you very much for everybody who's joined us so far. Well, actually let's start at the top. From Zena, what's the top piece of advice for staying productive during this time of forced working from home?

Heidi Waterhouse:

What's my top piece of advice for staying productive for working from home? First of all, your value as a human is not related to your productivity. I think we need to say that really clearly. Your value as a human is not about productivity. That said, I think all of us feel better if we're doing something useful and a useful thing that we can do is keep working. I think it's really useful for us to be productive to remember that we had all these rituals for going to work and for being at work. One of the things that I do and have done for a long time, because I work from home anyway, is I have a commute. In the summer when it's not quite as wintery in Minnesota, I sometimes go for a bike ride before work.

In the evening, I go and I sit on my couch downstairs without talking to anybody after work for half an hour so that I can change gears from work to home and come upstairs sort of free of all of the work stuff. I think whatever rituals you need to be productive are important to preserve, even if they feel slightly ridiculous when you're not leaving the house. We have a sales guy who is ironing his shirts and tying his neckties for webinars. I'm like, "Good for you." If that's what it takes for you to feel productive, you should absolutely do it. If what it takes for you to feel productive is working a split shift where you work four hours in the morning and then you swap with your spouse to do childcare and then you come in and do a few hours at night, that's okay. It is whatever you need to get through this.

Yoz Grahame:

Yeah. There is, I think one of most useful refrains that I've continually heard during this is that self kindness is important, right? Nobody is working at 100% capacity right now. It's learning that it's okay to make mistakes because everybody's making mistakes as we do this. Everybody is winging it. You are not the only one. We are being more tolerant of each other, dealing with this because we know we're all dealing with this ourselves. Working to find... Trying things out is a great idea because you're not sure what the best way to do it. It's part of the Agile concept of continual rapid iteration and learning what works and what doesn't and tweaking as you go. Thank you for that. Yeah.

Heidi Waterhouse:

What's our next question?

Yoz Grahame:

We have somebody who's Careful Dez says I think as engineers we might struggle how to identify what is a feature flag and how to write code using feature flags so the code does not become too complex. Any ideas, advice about that? Any ideas about how to write code using flags so that it doesn't become too complicated?

Heidi Waterhouse:

All right. This is actually a really key question and I'm glad you asked. It's important to understand that there are two main types of flags. There are semi-permanent flags, which I sort of think of as the ops and sales flags. Those are things like entitlements that turn on and off access, and you want those to stay around permanently. Then there are temporary, ephemeral flags that are going to go in your code only as long as you need them, so that you can do this a deployment and the testing. Then you're going to remove them because you don't want to leave a ton of flag sitting in your code. That's a threat surface, not necessarily in a security sense, but in an accident sense. It's sort of like leaving Legos all over your floor. We want to take out the Legos and play with them, but also we want to pick them up because there are a few things worse than stepping on a Lego in the middle of the night.

Heidi Waterhouse:

In the same way, you want to pull those temporary flags back out of your code and set up a hygiene cadence to do that. Otherwise, then your code does become too complicated. I do think that feature flags are sort of a clarifying force for what a feature is. This is a discussion a team has to have together. What is the size of a feature? Well, frequently a feature has elements that a lot of different teams or people work on. Maybe there's again this leader flag that dictates the feature as it appears to users. If you add check scanning to your banking app, well you need image recognition and you need a camera use and you need a whole bunch of different things. Each of those would be a sub-feature that had its own flag for testing and then they would roll up to this leader flag that would be the feature. As a team, when you're designing your software with feature flags, you're going to think about what is a testable unit, a minimum viable unit to flag.

Yoz Grahame:

Right. It's great. It's interesting to think about it in that way. I hadn't really thought about it. In terms of that one of the uses of flags, it's fascinating to see in engineering the byproduct because of practices. Sometimes the byproduct of the practice is just as useful as the main benefit, and so in this case, using flags to define the themes of where a feature is and where the soft features is a useful practice itself [crosstalk 00:32:20].

Heidi Waterhouse:

Sort of in a throwback way, it's very object-oriented. Like what is the object that I'm flagging?

Yoz Grahame:

Right. There's something else with the flags. I think part of the advice that we give is with a flag, try and have the flag do the smallest possible thing. Name and describe the flag in terms of that fact.

Heidi Waterhouse:

Smallest useful thing.

Yoz Grahame:

Exactly.

Heidi Waterhouse:

Not just the part. You could have a lot of flags that control unuseful things that does this variable or not? Whatever the useful unit of work is and that's a team by team thing. We know teams who flag really enormous things and manage fine that way. We also know teams who are like, maybe this one day of work for one developer goes under a flag. It depends on your team dynamics.

Yoz Grahame:

Right, right.

Heidi Waterhouse:

The other thing... Hang on.

Yoz Grahame:

Yeah.

Heidi Waterhouse:

The other thing about finding the seams is that it's really useful when you're doing decomposition of a monolith because what you do is you flag the whole monolith and then you create replacements for the microservice that is replacing a part of the monolith and as you have replaced it, you can turn down the monolith and make sure that your replacement microservice is actually taking the load that you expect it to. You can identify exactly which part of the monolith you're breaking out and replace it using a flag without having to do an abrupt cut-over. You can sort of do a leveling.

Yoz Grahame:

There's, especially with percentage rollouts, there's something, and I think this has been a topic for both of us with the user context that you provide for a flag. We tend to think of it as users, but it doesn't have to be a user. It can be a server.

Heidi Waterhouse:

It can be a server, it can be an endpoint.

Yoz Grahame:

Right. Some percentage of anything that you want to gradually increment and see it take the load and validate that it's working. Yeah, that's fantastically useful. Let's see what else... Just to round off, using flags to write code and avoiding complexity of code. What we've got is describing the code in terms of flags and then we've got making sure that you have good hygiene around flags, cleaning up regularly, especially temporary flags or rolling out features. We've got making sure that flags do the smallest useful thing. There's something that we talk about a lot. When teaching with one of the demos we have about dark mode, right? Dark mode in an app. If you want to add a new setting that enables dark mode in an app, the flag doesn't have to be wrapped around the whole dark mode code, it can just be wrapped around the bit of the UI that makes the dark mode checkbox appear or menu item appear, which will turn it on and off for everybody without actually changing the main logic.

Let's see what other questions we have here. We've got... Well, we've talked about... We have Ellie asking about advice helping us iterate on your work from home processes that we're figuring out. Is this something... I suppose we talked already about trying to identify useful work from home practices that many of us are getting used to. You've been doing it the whole time so far. Some of us, I'd definitely much prefer working in an office, because it provides that neat change in environment, going into work mode, going out of work mode. As you were saying, having some way of making that change happen at home is useful. How about, have you seen anything around how to do this with teams? As teams transition into work from home mode and try to come up with better practices, have you seen any interesting things there about trying to make that migration for team?

Heidi Waterhouse:

We use a tool called Geekbot, which I find super useful. It asks us for essentially our standup status, including, and this is really cool, how do you feel today? Answer with an emoji or GIF, which gives us this sort of sense of where everybody is in a feeling like, if I see that somebody has posted just a facepalm GIF, I know that they're already having a rough morning and I'm going to take it a little easier on them. If I see that somebody has posted a bicycle gift, I can ask them if they got to go out on a ride today. It gives us this little team bonding moment, but it's also completely asynchronous. It's not like Geekbot happens at the same time for all of us because I have mine set to a different, yeah like mine asked me at the same time in the morning, but I'm in a different time zone than most of my team.

I can just see these team check-ins roll in without us all having to be on the phone at the same time. I think that there's a lot of asynchronous stuff that has been percolating around that we haven't really needed. I think one of the great resources to look to is the open source community because by definition, almost all of them are working remotely from each other and still trying to figure out how to coordinate. When we're looking for advice on that, go talk to... Go see what OSI has, go see what your favorite open source developer does for their side project because you're going to get a lot of insight on how they can accomplish really impressive things without being physically co-located. I think as far as iterating on it and understanding what's happening, I have, it's not really a bullet journal, it's kind of a bullet journal.

I just have a list of things that I have done in my day that helps me sort of calculate how productive a day has been. Then if I'm really on the ball, I write in. If I did something new, like did I set up my meds, so I remember to take them. Did I go for a walk before work? I can sort of correlate that. I use a tool called RescueTime. I installed it in 2016 and it locks me out of Twitter for the first half-hour of my day, and it monitors-

Yoz Grahame:

Very [inaudible 00:39:23].

Heidi Waterhouse:

Yeah. It turns out existential dread is bad for my productivity. I can't-

Yoz Grahame:

You don't say.

Heidi Waterhouse:

I can't with news in the morning. This is actually a really important thing for me is I can't read news in the morning because I just get so anxious about everything that's going on, I can't work, and so my morning routine does not have any news or outside information in it. The other thing-

Yoz Grahame:

That's great because my morning routine at the moment is just kind of look at Twitter and then curl back under the duvet for another three hours.

Heidi Waterhouse:

Right. Yeah, exactly. The other thing that RescueTime does for me is it prompts me to... It's tracking where I'm spending time in my apps and then it prompts me to say what I was doing. Like you spent a bunch of time in Google slides, well. Now, I can say, okay, I was writing this talk and I can go back and look at that historical data. It's really nice.

Yoz Grahame:

That's great. I need to try that again. I did try RescueTime a decade ago or so. The problem, I think this is something, I know that you've spoken a lot about ADHD and I have similar battles with it. It is so hard to build and maintain these kinds of habits and especially to keep track of what you've actually been doing even when it turns out that you've actually been quite productive, but you forget that and so unfortunately you end up beating yourself up despite the fact that you've stayed on focus.

Heidi Waterhouse:

We have a team Trello board where we put all of our projects, make a Kanban board. This is what I'm doing, but we've taken to adding a mini tasks checklist of all of the requests that we've gotten and fulfilled. That's been super useful because it turns out that at least the last couple of weeks when people are on lock down, we've been really reaching out and needing human connection. I've been spending so much more time in meetings and socials than I would at other times, and so it's been really useful for me to say, okay, I only got so far on these projects, but I also did these 10 pop up things that came up and that's where my time has been going. It's a thing to look at and it's a thing to say like, we're all figuring this out. How much meetings is too much meetings.

Yoz Grahame:

Yeah.

Heidi Waterhouse:

Is pairing a better option? Is total silence a better option? Everybody works differently. It was just easier to self regulate in an office.

Yoz Grahame:

Yeah. Yeah. It's particularly what you said on the point about meetings is the initial reaction when the whole working from home and self isolating started, the meetings disappeared for a while and now they're back.

Heidi Waterhouse:

It's kind of great.

Yoz Grahame:

Yeah. Then now they're back kind of with a vengeance as it were. We're having fun using Snap Camera and all kinds of things to be pineapples or potatoes. There's still an awful lot of time, a lot of the workday broken up in that way that that can disrupt productivity.

Heidi Waterhouse:

I think for a team, maybe you say... We have a no meeting Wednesday at LaunchDarkly, but I think maybe for your team you could say, we're not going to have any team meetings in the morning, block your time. You're having a meeting with you so that you can get some focus work done.

Yoz Grahame:

Right. This is actually, we have a question about that at the moment. Someone who procrastinates a lot, what do you think is a good structure for using your time to program? I think, well the first one that you just mentioned is to do with blocking your time off in advance to specifically plan blocks of time around that. Any other advice on that?

Heidi Waterhouse:

Accountability buddies. That doesn't have to be somebody that you work with. Maybe you have a friend who you know is also struggling and you're like, okay, we're... I do this with writing all the time because I love writing, but it's hard to get into. I'm like, "Okay, we're going to do a one hour writing sprint and if I see you online then I'm online too and we should both yell at each other." Having that feeling that you're both working on this together and you're going to report back to each other about how that's working. The other thing, procrastination is such a huge problem for me. The worst part is productive procrastination. I am capable of doing so much useful work that is not the work I need to be doing.

Yoz Grahame:

Yeah.

Heidi Waterhouse:

It's not like I'm goofing off, but I'm not doing the one thing that I really need to be doing, and what's useful to me in that moment is for somebody to explain to me why the one thing I'm supposed to be doing is crucial right now. What is the long-term implication of doing this? I don't respond well to, well, because there's a deadline. I do respond well to, well, I mean, I respond to deadlines somewhat, but only if they're backed by like, or you'll fail. What I do better at is if you get this out, here's the benefit to actual humans as opposed to the company.

Yoz Grahame:

Yes. That's an incentive. Unfortunately, we get used to incentivizing by punishment rather than by reward. I think for you and I, this is something I have a lot of problems remembering, is trying to focus on the benefits of whatever it is that we're currently avoiding.

Heidi Waterhouse:

Right, also-

Yoz Grahame:

Yeah-

Heidi Waterhouse:

Go ahead.

Yoz Grahame:

Exactly. It's remembering that, hey, that thing that we want to get done, people might really like it. That or that it'll be so good when we finally got that sorted and it's not weighing on our mind anymore.

Heidi Waterhouse:

I use a lot of micro rewards. I literally have a family budget for M&M's next to my desk. I get an M&M when I finish a paragraph and I'm like, "Yes, I am the most reluctant dog trainer ever."

Yoz Grahame:

Skinner box your way to productivity.

Heidi Waterhouse:

Exactly, because it turns out that we like dopamine and we will do almost anything to get that consistent feeling of accomplishment. If when you're coding... There are two different problems with, how do you do that? One of them is, there's a hump to getting started and this exists for everybody, but it's worse for people with ADHD is task switching into something that takes work is hard. I don't like Markdown and yet I need to be writing in Markdown. Every time I open Markdown and I'm like, I hate this and it's not HTML and I hate everything. What I do is, I'm like, "Oh wait. Maybe I'll go do something else. Anything other than look at this Markdown."

Yoz Grahame:

Right.

Heidi Waterhouse:

Right. The thing that I did was I wrote it in Google docs and then I've lowered the entry from write a thing and learn Markdown simultaneously to I've written a thing and now I'm just translating it to Markdown, and lowering that threshold to switch into a task is really important for me.

Yoz Grahame:

That's a great idea.

Heidi Waterhouse:

Once I get into a task it's fine, but sometimes I have to break it down into... Depending on how averse I am to doing this task, I have to break it down into like, yay, I opened the window, reward for me. I opened the... Literally, I opened the window, okay. I'm going to tell my friend, my accountability buddy that I opened the window, she opened the window too. Okay, next step. Write a sentence and once you get going and the momentum is there. It's just switching into it, it's really hard.

Heidi Waterhouse:

I think it's especially important for us to understand that in this time of pandemic and crisis, our cognitive load is already up to here. We are already churning on what if I get sick, what if my parents get sick? Do I have enough rice, right? All of that is taking a brain space we used to have for, I wonder how that code is going to work and so everything that we can do to minimize cognitive load, it's going to help us be more productive. That includes not task switching. There's a bunch of studies, and I am such an inveterate task switcher-

Yoz Grahame:

Same here.

Heidi Waterhouse:

Yeah, because anytime I hit a block of any kind of difficulty, I'm like, "I'll just go do something else." Some people it turns out how they, Oh, that's hard. I want to dig into it. I want some of that.

Yoz Grahame:

Right. The thing is I remember I used to be like that. I remember It's interesting watching my son now who has recently, who had been similarly dealing with ADHD and I've now been digging into creating his own games and doing all kinds of amazing things. I see him just go down and focus. He suddenly gets interested, going to work out how collision detection works in unreal engine or something and he'll dive into it for two hours solid and come out with a new bit of his game, which is kind of amazing. Yet now, I think with the added stress of everything happening and we're pulled into this natural state of having 100 windows open and certainly more enough tabs to break Chrome opening our browsers, and with distractions flying at us every five minutes, being able to turn those off for long enough that you can actually focus.

Heidi Waterhouse:

Yeah. It turns out that in order to get into something, you have to be bored. It's really hard. When you're thinking about structuring your day to be more productive, when you have very little cognitive resource left, you need to avoid task switching. You need to not have your day broken up by meetings. You need to be able to just open the one thing you need to be doing and get bored enough to do it, at least for me.

Yoz Grahame:

Yeah, yeah. Yeah. Being able to take emotions like boredom and other things that normally are so painful and use that pain for good, let the hate, let the boredom flow through you.

Heidi Waterhouse:

Exactly.

Yoz Grahame:

We've only got five minutes left. We have one more question that I'm going to go to, which is from Gary 4D. This is returning to the topic of feature flags. This is about, just a couple minutes on what you were saying about leader flags. You are saying it is sometimes useful to have smaller flags in a lead feature flag. Any good examples of leader flags?

Heidi Waterhouse:

This is the one LaunchDarkly loves to talk about. You may remember that several years ago Atlassian rebranded everything that they do, the Jira, the Confluence, everything got a new icon, got new CSS, it all looked different. That happened when the CEO flipped a feature flag on stage. Instantly all over the world, everybody had the new branding.

Yoz Grahame:

That was amazing. Yeah.

Heidi Waterhouse:

That was LaunchDarkly behind the scenes because what had been happening is all of the different teams responsible for the rebrand across Atlassian had worked on their special part, here's the CSS, here's the logo, here's the different font, here's all of the different parts of this rebranding. They had already deployed the new material, so nothing had to be loaded. All of those were behind individual feature flags that could be turned on and off for testing. It wasn't like the whole rebrand was under this monolithic flag, they all had individual flights and then the individual flags fed up into this leader flag that was a simple Boolean on-off. It says show new branding, and in the moment that flipped, we do... LaunchDarkly does a push architecture. We pushed to all of the clients everywhere this new instruction to say use the new stuff instead of the old stuff, but because the resources were already there, nobody had to reload anything. Instantly, you get this new version and it was so magical. It's super cool.

Yoz Grahame:

It's great. How do you actually, how do you do that in practice? How do you use LaunchDarkly dashboard to set that up?

Heidi Waterhouse:

In practice, what you do is you create a dependency. In the LaunchDarkly dashboard, before we head to the rules, there's a thing that's called prerequisites, dependencies, something. I've written it both ways.

Yoz Grahame:

I think it's prerequisites at the top of the top thing.

Heidi Waterhouse:

Prerequisites. You say this flag depends on leader flag and if leader flag is off in this environment, it won't show.

Yoz Grahame:

Right.

Heidi Waterhouse:

Of course you can have different environments, so you could have leader flag on and test and test that everything is going to go right and still have leader flag off in production. It gives you two levels of control. You're never going to see anything in production until leader flag is on. In test, you might turn leader flag on and have five of 10 elements on because that's all that's ready. Then if there's a problem, you could turn it off and fix it and turn it back on and be able to see it all in test as it's going to look in production and it's all the same code. You're not having any code divergence.

Yoz Grahame:

Right, right.

Heidi Waterhouse:

It's just what's being shown due to the user context.

Yoz Grahame:

It's all due to the flag set up. That's fantastic. Being able to just have a whole bunch of these flags depended on this one flag, so you've only got one flag to flip when it comes to turning everything on them off.

Heidi Waterhouse:

Yup.

Yoz Grahame:

Makes it much simpler especially in an emergency.

Heidi Waterhouse:

There's also a great way to use prerequisites to say only people with version 1.2 and above get this feature because it breaks things otherwise. Prerequisites give you a lot of power to say if and only if.

Yoz Grahame:

Right, right. You can end up doing some quite powerful logic there. It's a nice way to modularize that logic. You have the complex rules around versioning in one flag and then have the other flags that use that keep the logic in that one flag rather than leave it to re-implement themselves.

Heidi Waterhouse:

Mm-hmm (affirmative).

Yoz Grahame:

Brilliant. Thank you. I think we're coming to time. It's now 11:00 AM. Hello. Thank you. Sorry about that. Fortunately lost the channel, the lost the home connection at the end. Just saying thank you so much again for your talk and for answering questions. We have had many speakers over the years at the Test in Production event in San Francisco, New York and London.

Heidi Waterhouse:

Berlin.

Yoz Grahame:

Berlin. Thank you. We are eager to bring some of those people back to talk, but we are more eager to have new speakers who have never spoken at Test in Production before. If you think you might be one of those people, please do let us know. Give me or Heidi a yell or the LaunchDarkly account @launchdarkly on Twitter. We would love to present, help you present your talk about testing in production, about site reliability, about feature flagging, QA, circuit breakers. What other kind of topics, any of those and certainly the topic about how teams work together remotely in sync or asynchronously during working from home and self-isolation. I think we're all learning something, learning lots of new things there. Even those of us who have been working from home [inaudible] till now. Please let us know. Thank you so much for joining us. We will be back next week on Thursday at 10:00 AM Pacific time.

Heidi Waterhouse:

If you have any other questions, you can go ahead and tweet to us at the #TestInProduction or just at @LaunchDarkly and we'll get around to answering them.

Yoz Grahame:

Yes, please do. Any other questions that we weren't able to cover in this talk just at us directly. You can see our Twitter handles there or @LaunchDarkly and we'll be very happy to get back to you. Thank you very much and see you next week. Thank you, Heidi.

Heidi Waterhouse:

Thank you.

Yoz Grahame:

Goodbye.

Testing in Production on a Distributed Team

Like what you read?

Use feature flags to safely test software changes from anywhere, in any time zone.

More about Industry Insights

Like what you read?