Kate Green has been writing code for over 20 years. It’s a great job that has allowed her to think in such novel ways. She started as a web designer in 1998, took C in college, and spent many years honing her skills while working all over the Naval Research Lab in Washington, DC. Kate has secured machines, built database applications, done print and web graphic design, and maintained large applications. From there, she moved to the Chronicle of Higher Education and the Chronicle of Philanthropy, where she learned to love DevOps and led front end web development for both newspapers. After that, Kate took her newfound DevOps love and moved into automation and testing, where she created larger build pipelines and created robots, dashboards, and whatever else was needed to test products. Along the way, she learned to look for the big picture and to question the nature of testing. From there, Kate led an automation and tooling team for a microservices-based travel application. They built the robots, created developer-centric products for the engineers to use, and built and wrote automated tests. Her role was a hybrid of team lead, architect, and product manager.
(upbeat music) - Thank you for coming and chatting with me today about testing for continuous delivery. What I'd like for you to get out of this seminar session is to be able to evaluate effective testing so that you can get the fastest and best testing to get your software shipped reliably. So let's get started. What is continuous delivery? What is continuous deployment? For the purposes of this talk they're about the same. So let me just differentiate a little bit, but in the end we'll be addressing both of these because testing happens before we're actually deploying software. So in continuous delivery, you are building software that a human will deploy. What that means is it's ready to go, someone makes the decision to make it happen. In continuous deployment, the software just ships. You haven't gotten to a point where you are so confident you're ready for it to go out no matter what, and now because of testing, like that's happening regardless in both of these scenarios so. I will refer to both as CD. So despite these differences, we have roughly four goals that we care about when we're building a CD pipeline. Deployable artifacts, these are artifacts that work. We are automating the entire pipeline from code merge to deployment. What that means is everything in the middle, there's no toil, there's nothing that your SREs are having to do. It is just codes merged and artifacts are built. And then it's reliable, you are 95 to 99 point, however much nines you like , sure that this is not going to break your production builds. And in those small amounts, those small places, you wanna be able to roll them back. So for testing, we really care about number three and we also care about two and one and number four. Yeah, we gotta get that done but for this talk, we're caring mostly about the reliability. In testing and pipelines, CD pipelines is about having concise, concurrent and reliable built. These all add up too fast and reliable. So speed really matters because if you're sitting around waiting for 30 minutes for your tests to run, that's not gonna work because somebody else is committing code behind you and they're sitting having to wait too and now you're stuck because both of these things wanna go out. How do you deal with your merge conflicts? So having speed in your tests really, really matters. So let's go through these three real quick. Concise, what that means is each test is testing one thing or a very small amount of things and they're quick as a result. That means that you're testing one assertion. You're you're saying, okay, does this thing return the right type? If you're looking at unit tests. If you're looking at an end to end test, a system test, you wanna make sure it does one thing correctly. That can be really hard and end to end test but we'll get to that in a little bit. Concurrency. What that means is your test should be able to run in parallel. At the unit test level, this usually isn't an issue. Those are usually pretty quick, but if you can add in concurrency, you're going to have your time to get it done. Reliability is super important because if your tests don't pass 95% of the time, then they stop being something you can rely on to prove whether your system works. So having tests over 95, preferably 99% of the time passing will help everybody to know when there's an actual problem. So let's talk about strategy. Your test topology, I'm using systems like SaaS products, websites, e-commerce and also like data pipelines also make sense and that's where I'm approaching it from, that's what my experience is in. You can definitely use a similar paradigm with other things that you need to build for. But this I'm just gonna use this paradigm, it's the most well known in testing land. So you have unit tests which is testing all of your functions, testing the lowest level stuff. Integration is testing parts of the system, testing that your billing works, testing that your customer flows, like your form fields are working and the functions that work in there. You're working in a part of the system, which will then add up to the entire system which is your end to end tests. Those are the the most important, but also the most risky because testing every single user flow is brittle, hard to maintain, and it's really important to get right. So if you're testing in all three of these regions, you're gonna be able to catch the vast majority of your errors inside of the system. So let's talk about pros and cons real quick and then we'll move on to test focuses. So at the unit test level they're fast and they're isolated, but they're not gonna catch larger issues, places where your function that tests for different things. Like once it's testing one little piece, but if there's a problem in the transfer between two functions it's not gonna work. At the integration level they're functional. You're getting a piece of the system but you still have some issues with mocking, which is also an issue with the unit tests. Mocking is when you take and pretend to return something outside of the system you're testing. Mocking is pretty much essential, but it can be a smell. It can be something that makes your system, your tests not as reliable because you're always sending in the same response with your box. So how you do mocking is something we're gonna address in a moment but it can be a con. And then the end to end tests, we talked about it. They can be brittle, they can be slow but as long as you're having the right amount, which is to my mind with most like e-commerce and SaaS products is 20 to 25 end to end tests that hit the main flows for users. And these are gonna be the ones that you're going to be running the most as you go. So your focus is you don't wanna test everything. 100% test coverage, no, no, it's not possible. And besides that, you'll be testing things that don't really matter. So think about the parts of your system that change often or break often. So you're thinking about like your inventory systems. You're thinking about if your shopping flow is changing, like you wanna move your shipping in front of your payments, something like that. If you're making changes often to experiment with users, those are places where it can be brutal. And then there's the instinct you get as you get further along in your career. Something feels wrong about this part of the system, throw some tests in there. That's where you need to be focusing. And as you go, you can go back and pull those out of your CD pipeline and you could still run them locally. You could still run them now and then, but while something is new and it feels like it might be brittle, throw some tests in there. All tests have pitfalls, number one is maintenance. Every test you write needs to be maintained. So you need to be testing effectively, which means only test the things that you expect to break. Don't test your getters and setters because they would usually just work. If they don't work you have other problems. And the other thing that you can do for testing effectively is to be evaluating your test suites on the regular. And what that means is going in and asking yourself, asking your engineers to also be looking, does this test what I think it does? Is this test effective? Is there a better way that I could do this? Those are the kinds of things you need to be thinking about and your engineers should be thinking about pretty much with every new feature you're adding. And then the next thing is to avoid fixtures. Fixtures are a bit like mocking in that you're taking in a test user, and you always use the same test user. So this is more of a unit test based thing but this principle can also be applied at higher level testing with integrations and also into end to ends of property based testing. What property based testing gets you is random inputs. So when I'm using property-based testing, I really love it for math functions, I love it for text inputs. I applied that recently, where I wanted to test to make sure that any input would pass. So property-based testing applies, can be up to like 5,000 random inputs. I attached to link here, and. I will drop it in the chat so that you all can actually click on it. Learning about property based testing and applying it is a big weapon especially at your unit test level to making sure that it is highly maintainable and you don't have to do that much maintenance. So the next thing that I've seen a lot is brittle tests. This is usually at the end to end levels we've already talked about. So what you wanna do is test small parts of your system. Even if you're working at end to end test flows, what is important here is to break it up into small things. You don't wanna run a whole flow, login, go shopping, put something in your cart, take it out, put something else in your cart, check out, the test broke like three minutes ago, it's not gonna work, so test your login. And then there's some ways that you can do it with like cookies to then get yourself always logged in at the beginning of a test. And I've linked here some of these best practices. SaaS Labs has a lot of resources on helping to make your tests more atomic and smaller and concise. So then we can work from little bits and pieces and parts and then you're sure, and this will actually increase the number of tests, but what it does is it limits the brittleness of your tests. So you should have 100 tests instead of 20, but what you're testing are small pieces and you'll be able to catch more problems. Your tests are running slow. I would say this is the thing that I've run into the most, especially at the end to end level. So you're looking at concurrency, you wanna run these tests in parallel. And the other thing is look at your test suites again for relevance. This is a second reason why you need to be in there and just asking yourself and asking your engineers, is this a valid test? Is this test doing anything useful still? One of the big things, the great things about (mumbles) control, you can take a test out and put it back in because you have the history, so take it out, skip it. Those are the biggest things to get into slowness. There are a few other things but those are the main ones, I've seen in especially concurrency, super-duper helpful. So mocking, and I'm gonna say, put fixtures here as well, mocking is little as possible because when you notice that you have a bunch of mocks, it means your system is not decoupled enough. You wanna think about pulling stuff out to the point where it's reasonable. You don't wanna overdo it either because then your code becomes harder to maintain on the actual code base side and not the tests. But you wanna think about decoupling your system as best you can. Yeah, so I just said, yeah, lots of required mocks are a smell, and then the same thing can be said for fixtures. So it's best you care to keep them out and I totally get it. Like sometimes you just want something out the door and you wanna put a couple of tests on it, a fixture based or a lot of mocks can make that happen faster, but that's technical debt. That's debt that you wanna pull out eventually. So my final thing is, does it really have to be a test? Because like we've talked about brittleness at the end to end level, is there a way that we can limit our end to ends and use them as monitoring? So throwing it into our monitoring system. So let's start there and we'll talk about what that means. That means running them all the time, every 15 minutes, every hour. And then at that point, you're gonna be accepting a failure rate. Especially if this suite that you'd like to apply to monitoring isn't as a true end to end test, which is to say to log in to checking out. You do need a couple of those because it's really important to have at least one, because then you can apply it here. What this gets you is a canary, which means that it's gonna tell you when something is down. So if you're running this test on the regular, it becomes a monitoring piece, not necessarily something that you want to pass 100% of the time because it's long and we accept that risk. But what this does get you, if this test fails twice in a row, if this fails two times in an hour, whatever your threshold is, you're sending an alarm to you to your people, to your SREs, to whoever's getting your alerts, whosever was on call that day. This is gonna tell you your site's down. Probably not as fast as something like Pingdom or other services that check for uptime, but this is a way to make sure that your system is working as you expect. And let's just talk about monitoring also because we can also get similar issues with things breaking from monitoring. So when you're in monitoring, you have your golden metrics. But the ones that are the most related to testing are increased 500 error rates. There's also 400 error rates. If you're seeing an increase in 400 error rates it could be that you have a typo in your links or something else that it's showing up as a user error, like at the 400 errors users, 500 are system-based errors. So you're looking at 500 error rates first but also 400 error rates and that's something else that you can send an alert on. If your risk response times go up, there's something up with your API probably. So you need to be checking on that. And the same thing with latency, if your latency is going up there's something going on inside of your network. So you need to be checking and sending alerts on all of those. And then checking up time monitors. Your testing is monitoring conserve in that way, but you may also want something that's just testing that your site loads every time. So those are some good things that will help you to limit as much testing because now you have something else watching. And then there's the last thing is the role of manual testing. Humans do things like exploratory testing way better than robots do because robots aren't creative. That's what makes us so great, we invented these machines. We imagined a whole new part of the world and the part that we're mostly occupying, especially now that we're all on zoom all the time. So we are also the ones who are gonna be creative enough to see how it behaves under pressure, how it behaves when some silly user does something that nobody thought of before. So having a manual testing regimen is still important even as you have continuous delivery. Continuous delivery makes things a lot more reliable inside of your system, it is a great thing. But you still need people looking around, poking around, figuring out how to break something. 'Cause I can guarantee you if your people don't find it, somebody else will and then something could go wrong in ways that you don't want. So that's really important to consider. So that's a lot of the balance when you're building testing. On one side you have these humans who are doing creative things, but what out of that can you then turn into automation? There are some things and then there's also building automation tools to help your automated testers. So there's a balance here of doing what the humans do best and doing what the robots do best. Now we need to evaluate. So all of these metrics aren't a single point in time, there's nothing that's proven in a single point in time. So that's the first thing I've got to say. The next thing is that number one and number two, reliability and time to complete a test suite don't mean anything 'cause you could have no tests. Those are fast, those don't mean anything. So let's talk about the defects detected, how effective are they? How many did they find last month? Got to keep track of those and then also versus what's slipping through. So there's a comparison, a percentage that you can work with. And then there's also meantime to resolution versus these bugs found in a testing versus found in prod. So what that gets you is how fast are you solving them if you're catching them before you deploy versus prod? And there's also a number three, the raw numbers, but percentages also work well to express your test suites effectiveness. So the biggest thing here is to show, I mean, a lot of people don't take testing very seriously, these kinds of metrics can show how effective your test suites actually are and hopefully buy you more resources to test even better. So this is what I have. This is all about building in better tests and better test paradigms for your continuous delivery and continuous deployment pipelines. Thank you for coming (upbeat music)