Michael is a Solutions Architect at Betway and has worked across a number of the web teams at the company from Bingo lobbies to Sports homepages. He is also a Microsoft Windows Insider MVP and fan of all things Star Wars.
It could also be that a customer calls our call center and is having a problem. The call center could turn on debug mode for that particular user, and now we start getting a lot more logs coming through for just that one user. So it's a nice way of being able to kind of leverage the infrastructure of testing in production to better understand what a particular user might be experiencing or a particular bug that might be present. And then moving on to the actual experimentation with doing testing in production, it's really important to understand what success actually is because most people will have a slightly or maybe very different opinion of what success is gonna look like for an experiment. And it might not be clear when someone comes up with a hypothesis what success really looks like. And so it's very important that everyone involved understands what the goal is. So that needs to be set and understood, and the success metrics needs to be understood. What number is it that is ultimately gonna determine if this was a success or a failure? And it's interesting, we've had some very good conversations around what does success look like for different types of hypotheses? And it's not often that we found that everyone is on the same page at the beginning. And it's important to do that before you run the experiment. Otherwise, you run the experiment, and then there are people questioning whether you actually even looked at the right stuff. And it's important to make sure that you're all on the same page before you kind of spend any time running an experiment. And then off the back of that as well it's also very important to understand the sample. Who are you actually gonna run this experiment on? It might be the obvious 50-50 A/B test, but maybe the person who's suggested the hypothesis thinks is a fairly high risk. And they might actually think it's better just to target just a 20% of users and compare it against the 80% which is interesting. And certainly conversations we've had have presented that in the past. But then you can target on all kinds of stuff that we've already spoken about here. What about the device type? What about the country, what about subdomains? All this kind of stuff, that needs to come out. You need to understand where is the experiment gonna be run? And make sure everyone who's involved is happy and understanding of that. And then this is a really important one that we potentially have only properly understood in the last couple of months. But the idea here being is you don't wanna approach a hypothesis with a particular result in mind. The idea being that if people think that the hypothesis is going to be successful, then we probably shouldn't bother running it as a hypothesis because if we think we're right and we are right every time, then you don't need to go through the extra overhead of doing it as a hypothesis with an experiment. But we know we're not always right. So we shouldn't assume that every hypothesis would be successful. However, that is exactly what we have been doing. And the way that we know that is if we found a hypothesis after we've run the experiment to be successful, we wouldn't ask any more questions. We would have a look at our data, go, "Yep, that's correct," and put it live. If however it hadn't passed our success metric, we would do a lot of questioning. Why isn't it passing? Was there a marketing campaign? Was there a particular event on in a country that has kind of skewed the data one way or another? And we had all of these questions, but they were only ever asked when a hypothesis was unsuccessful. And that told us that we had a massive bias, and only the successful ones did we ever think would actually... We thought all hypotheses would be successful which is a really bad way of doing it. It also meant that we weren't doing the minimum amount of work possible in order to prove the hypothesis. 'Cause we always thought they'd be successful, so we actually baked in quality and did a more elaborate solution than is needed to just validate whether the hypothesis was successful or not. So a very interesting one to be aware of. And kind of following that and some reevaluation of how we're working with hypothesis, that has then led us to the hypothesis framework. And I wanna just share in the last few minutes of this talk what that framework looks like for us and how we're using it. So there are six steps to it. The first one being well, someone needs to submit a hypothesis. We then, that looks like a form. We used to have a form like this. We've actually modernized it a little bit more, and it is now an automated form system. But the kinds of things that we're asking for here is we wanna understand the problem space that the person's coming from with their hypothesis. We wanna understand what the hypothesis and what are the acceptance criteria of this piece of work and of this hypothesis? We wanna understand what the success looks like. Again, coming back to that feedback that we've learned of everyone needs to be on the same page and understand what does success and failure look like? And we need to know the sample. So we get this all up front now, and everyone can see this in this document and understand what does this hypothesis really mean? What does it look like? How are we gonna test it? From there, it then goes to our product team who will review that hypothesis. The idea here being we may have had similar hypothesis like this in the past that we have experimented with. And whilst we aren't ruling this out, maybe we will batch them up. Or maybe it's just not something we're ever gonna entertain. So product will review it. If our product team is happy with that hypothesis, it then comes to the dev team, the QA team to analyze that hypothesis and understand what is the simplest way of testing that hypothesis? Once we have understood that, we can then go and implement that simplest way, and it could be as simple as adding an image of a button and recording the number of times that users click that button. And maybe we show a popup or something to the user to just say oh, this feature is coming soon. But we started to gather insight as to whether that feature is even what the users want before we've even thought about designing and building the actual feature. Once we've then run that ex- Once we've done that simplest piece of work, we can then run and perform the experiment based on the sample and the kind of the test that we want to run. And once we've performed the experiment after a certain period of time or we've reached statistical significance, whatever it is that is the success metric, we can then analyze the experiment. And in the analysis, we'll conclude whether the hypothesis is correct or not, and we can form some recommendations. The recommendation could be this feature needs to be reworked now to be production quality and put live as soon as possible. Or it could be this was unsuccessful, but maybe the sample was wrong. Maybe the way that we approached it was wrong, or maybe we shouldn't reevaluate this for another six months when we don't see a reason why this is gonna become successful if we kind of rejig the sample or anything. So that's our framework that we're using. It's been defined in the last few months, and we're slowly expanding this out to more areas of the business to make use of, but we're finding it really, really useful. We've tried to automate it as well. We're a Microsoft tech house, so we're using the Office Forms system as our kind of actual hypothesis specification. And then that creates, once that's submitted, it creates a new Teams channel, and we keep the person who requested the hypothesis, we keep them up to date throughout the whole process. They can see very clearly what we're doing. So it just leaves me to say that I think it's really important to use hypothesis-driven engineering. It makes things so much safer, so much easier. It takes a lot of pressure off, and it starts informing and baking in quality into the products that we build. So it's really valuable. And when you're able to do hypothesis-driven engineering, it's quite important to define a hypothesis process. And I hope some of the things that I've spoken around kind of give you some idea of what that process might look like. And it's probably different in each company, but I think it's important to have. And so finally, thank you very much for listening, (bright music) and I'll be answering questions in the chat.