AI and Machine Learning in Test Automation

110
Oren-Rubin-Test-in-Production-LaunchDarkly

On April 16, Oren Rubin, CEO and Founder of testim.io, spoke at our Test in Production Meetup on Twitch.

Oren explained the differences between AI and automation, problems with existing test automation solutions, how AI/machine learning can be used to address software testing problems, and more.

Watch Oren’s full talk.

FULL TRANSCRIPT:

Yoz Grahame: Marvelous. Let me just get these set up correct, hello.

Oren Rubin: Hello.

Yoz Grahame: Hello, welcome to Test in Production. Thank you so much for joining us. Today is April 16th, 2020. Our guest today is Oren Rubin, CEO and founder of Testim.io. Hello Rubin, Oren, thank you for joining us.

Oren Rubin: Hello my friend, happy to be here.

Yoz Grahame: Thank you. You are going to be showing us how AI and machine learning can dramatically improve test automation, which I’m looking forward to hearing about. Those of us at LaunchDarkly who use it, we do all kinds of different testing, most notably A/B testing. But functional testing is rather different to that, so before we get started can you tell us a little bit about functional testing.

Oren Rubin: Yes, sure, of course. There’s different type of testing, right. Someone can test… When you talk about the functionality, let’s take a calculator, we want to make sure that the output is correct, means if I do one plus one the result will be two. Different types of other testing is, performance testing is does the answer two come fast enough. What load testing is, what happens if I have 2000 people doing it at the same time, how would that impact on that performance. There’s accessibility, can someone that has some, wants to see it through some screenery there, will he understand where’s the one and where’s the two and get the result that two is shown. There’s visual validation, so I want to make sure the number two is aligned to the right or aligned to the left, are all the buttons aligned, those are called visual validations.

Yoz Grahame: Cool. So lots of different ways to make sure that this thing is working as designed and possibly even catching things that weren’t explicitly designed.

Oren Rubin: Yes, like different aspects of the application.

Yoz Grahame: Cool. So yeah we do a fair amount of that LaunchDarkly and I believe you’re going to go into things like the differences between different levels of testing and especially what I’m looking forward to, dealing with some of the brittleness that can happen in browser test automation.

So let me switch over to the slide view and you can take it away.

Oren Rubin: Sure, awesome. I think that the first thing I want to make sure that everyone understands, and of course we talked about what is functional testing, but we want to talk and understand [inaudible 00:03:02] and [inaudible 00:03:05] why exactly, what part of my application am I testing. Unit test is just the basic, the most smallest unit and of course I can test a few units at the same time together and how they interact between each other. That’s called integrations, test integration. And of course end to end it’s called, it means I’m running everything at the same time. Sometimes people get that confused because when people think about end to end flow, sometimes they think about not just all the systems, the back end, the front end, the servers, everything up and running, but they sometimes think about, oh, I want to do a user story from start to end. So some people call that end to end as well.

It mainly means that the entire system is up and running, but sometimes, as I said, some people think that end to end means the entire user story. The user logs in, add to cart, but mostly an end to end is everything out there. And the [inaudible 00:04:10] actually gives you just good and bad in every, pros and cons in every one of those approaches.

The unit it means there’s a lot of units, that means you have to do, to get a lot of coverage you have to go over basically every unit and write a test for it. So it takes a long time to have the full coverage and also to have a lot of confidence because you can be doing a place where two units work by themselves but together they don’t integrate well. And try to think all the different types of integration between all different units is hard, especially if you’re someone designing something well, then too many units know about each other, you have to check them together. So, that makes it much hard.

And end to end it has on the one hand the pros of course, higher coverage, you can check Gmail, like sending an email, getting and make sure that you got it. That can take you, write a test very easily, like in the same day you’ll have 30% code coverage for Gmail. But on the other hand it’s slower, you don’t know exactly what was wrong, was it the back end, was it the front end, so you need a lot more tools for a [inaudible 00:05:22] analysis. So people tend actually to have a mix of both.

I love showing the difference between unit test, which is something can work great by itself but you want to make it the whole, so if you google the unit test versus integration test you’ll see a hundred of those in different types of examples like those.

And one last thing before we get started is understand where end to end actually work with your functional test, where can you run. People think you can only run it before production and say oh, no, that’s the QA, that’s another team. But end to end test there’s no reason why not run it before production of course and after releasing the production. You can use that piece, some people will use that actually to synthetic monitoring, to make sure you run it every minute or in five minutes on your production just to make sure that you’re all safe. That’s in addition to all the APM tools. Usually they even provide those capabilities of running end to end test and then again, they’re slower but it mimics a real, more like a real users.

One thing that I would note, you should note is if it’s just your test, try to think about analytics and how you affect that in production, so how do you make sure that you can disable analytics so it won’t interfere with your real data, because people are… There’s other people that might depend on analytics to decide where you’re going, so you want to make sure it doesn’t affect that.

So as I said, people run different tests and combinations. Sometimes to run it depends on the location [inaudible 00:07:18] timing, so on every commit you can do that five times an hour, so when I have very short, fast test, usually unit test, you can do, before you merge you can do only test the front end side or only test the back end side depending on where your code is. You merge, you actually, as I said, you do some integration tests on several units, either in the front end, you mark all the servers, or vice versa, you just check the server. But before releasing you want to check with end to end and sometimes people even want to check different browsers.

So what people know and people hate about end to end test, and they know they think it’s very slow to offer and also that it breaks, it’s brittle and flaky and it breaks all the time. So there are only two things that I want to focus on right now and to show you where aI can make a difference. I’ll start with… Before I even start I just want to say I kind of tried, when you asked me, can you talk about AI, I tried to slice it up for like four pieces of where we are today, what do we think we have right now, what’s going to be over the next year, what’s going to be in the next, next year, what are the things that are going to… What are the steps that you’re going to be. Just like if you take the Tesla, and autonomous cars, well before you had full autonomous cars and they get approved, you have already kind of like, it’s not exactly self driving cars but it’s, they know how to keep a lane and it doesn’t know how to stop when the traffic light is red but there’s some parts. So there is different levels and that’s what people try to do, especially now in the world of agile.

So people do want to chunk it up into small valuable pieces and I do believe that machines can help us in different levels and some they already did. So when I talk about, let’s separate this between UI interactions and validations. Especially I’ll talk more about visual validations even though there’s different types of validation. I can validate that, when I said I’m going to use UI interaction to click one plus one, now I want to validate two is shown. I can validate the text, I can validate the network request, I can validate that the pixels are shown, the buttons are aligned, there’s so many things I can validate. Today we’ll focus more on the visual validation, show the challenges and of course where AI comes in.

So let’s start with the user interactions. In order to show sometimes how hard this is I like to play a game. So is there anyone watching us now that’s familiar with HTML, basic HTML or developers out there. If you do just raise your hand and let us know, be nice if you can guess. Going to show something that I want to perform, we’ll take a very basic app and I want to ask some questions, like how would you, how do you choose when you want to, and to automate pressing a button. Instead of a human, we all might want to automate our tests and to run them automatically, so we want to have, oh, how can I find the play button.

And it also explains I think throughout this why machines and doing that themselves and learning from humans without us instructing them how to do it, why’s it harder. So the basic way of us is actually coding right now. So, but coding means that you have someone, you have to know the app, you have to think and why can a bot do that automatically for us. So, until now bots were kind of very simple but you have to program it to decide which property to use. So all the tools for test automation used to just have one property that you could choose, either whether CSS selectors or X path, they’re basically very, very similar. They’re both ways to query a graph.

But the properties that you have are used in HTML, those are like an ID, class, text, whether there’s a tag name or linking to somewhere. Those are the things that you can use. CSS selector and X path actually gives you find something inside something, but basically this is kind of like the attributes that you have.

So, what I like to say is, let’s just say we always use the ID, why’s it that’s so simple. If there’s an ID, let’s choose that. And that’s my question that I ask people and if someone knows, just tell me, that’d be great to hear your thoughts. When is using an ID, let say that sounds ideal, that’s a unique identifier, why sometimes the next time I run it, it won’t work? And I’ll share a few of the things, like the reasons that I’ve seen that it shares, that it fails.

First reason is random generated ideas. Like I didn’t change anything, I said okay, click on something that has an ID and I look at the dev tools, I see the ID, I right click, I copy that, I put it in my test, run it and then I run the test and it fails. So it could be that you have a reusable component when you have two components, right, you can only have one ID. They can’t have that, so you have components that someone adds a random generated IDs. So, that’s the first reason that something can fail. There’s a few more just for the ID, but it could be that if you go along and something fails after a week, it could be that someone just changed the code. If you rely on one property, if someone changes that then it’ll break the code, because the code, assuming that it has some property is kind of like a little bit of white testing, white box testing. You know, you’re saying I know it has some specific property, this is not exactly like a human does, a human doesn’t check, oh it has an ID, okay, I’ll click this button. A human just clicks the button based on other things as well. Based on the text or the location, et cetera, the image.

So there’s, that’s another thing that can happen, someone changes the code, you fail. Another thing that I saw was actually if you look at the [inaudible 00:14:13] and you say it’s the same ID, what can I find it. Sometimes there’s a component inside of an iframe. So even finding an iframe, first you have to go, you have to switch to the right iframe. A lot of people have things like cached [inaudible 00:14:26] et cetera, so it’s randomly generated over there and those can break your test very, very easily, very easily. So you need a, in order to find the iframe, that’s again the same challenge. So it doesn’t depend on you, it’s someone else’s application, like you’re embedded in someone else’s application.

One of the weirdest things, not the weirdest but close, but I still see that. The standards doesn’t, the standard says that you have to have one ID in a document, but a lot of people have more than one. I’ve seen that a lot and they just copy paste something and they forget and the browser don’t enforce it, he doesn’t not load the page because of that. The page still works. But most query end is just result, they said get element by ID, they return one, which is the first one. So in some places you’ll get, you’ll see your page and you’re like, why am I getting another element, what’s going on. It could be that you have the same ID twice. I’ve seen that so many times.

This is the weirdest thing I ever saw. There were a few frameworks that were actually generating instead of, like want to do something on top of another, they had two bodies, the HTML had two bodies and actually they put them in a random order. So sometimes it works, sometimes it didn’t work. I was like, what. There’s like one ID here but why sometimes it doesn’t work, and apparently it was looking at the first body, all the tools looking at usually at the first body, so you want to make sure you don’t have something weird like that.

Those are the basics, what can make an ID, even though you didn’t change anything, your test will break. [inaudible 00:16:15] for example is like, why not class. You said, okay not use ID, you have a reusable component, you can use a class, but classes are for styling. That means they will change, that means that tomorrow people make adjustment and finding it based on just property of the styling, that means that tomorrow if you change the image it might loo different and this is where people start having, oh I want the Nth [inaudible 00:16:43], I want the third one, and those actually tend to break as well because you’re not relying on the third and CSS selectors, well until CSS four standard comes out, they’ve been talking about it for three years now, that you can find an element based on the content. Like instead of the third, can I find it based on that it says Miami next to it, and then click on the image.

So those things are not [inaudible 00:17:11] something you can do with X path, but people really love, especially developers, they know CSS locators, they love that. So that’s kind of like annoying as well that you need to, why CSS classes also often break every time you change the design, and people change the UI all the time. That’s the problem right now is that people are more agile, they want to release five times a day.

All the other options, trust me on that, they’re also super fragile. They all can tend to break and I think the conclusion here is that a single property is as for us humans, it’s too fragile. But on the other hand we can’t say, I want to look at 50 properties based on that, this is not something that humans could do, but this is something that computers can actually do very easily. And I want to show something, so and I think the first level by the way is using several locators, several properties and not just one. And I’ll try to use an example here. Let’s first, we need a test here. And so let’s pick something. Yoz, are you with me here?

Yoz Grahame: Yep.

Oren Rubin: So I have a few questions for you my friend.

Yoz Grahame: Go for it.

Oren Rubin: So what I’m doing here and I want to show something, I want to show how using multi locator works and I’m going to, but before that I need to make a change to the application. The application is going to click this yellow button right here. I added a break in the middle, it’s going to click the yellow button. What I want to do is change some of the properties so it’s going to look different. So first, what’s your favorite color?

Yoz Grahame: Let’s go with red.

Oren Rubin: With red, red is awesome. I like the fact that you’re like thinking of like, wait a second, what’s my favorite color.

Yoz Grahame: That’s a difficult question.

Oren Rubin: Oh, blue, no way.

Yoz Grahame: No, blue, ah.

Oren Rubin: Red is awesome. To change that and we can also change, I’ll pick my favorite color’s pinkish and I need some numbers. Can you give me a random number?

Yoz Grahame: 84.

Oren Rubin: Sorry?

Yoz Grahame: 84. Are we talking about…

Oren Rubin: I’ll use the four, but here I can change the 84.

And I’ll change one of the classes as well, there’s a number here so it’s going to look different. What I did here is I change, and I’m going to resume the test, so it’s going to look for that yellow button there, and what I wanted to show is that and ask is a human when something, when something changes we as humans we can still find the element. The question is, how do we do this. And I want to just look and say, if you look at all the properties that that element had, there’s a lot of them. She’s many of them. The fact that we changed the text, the fact that we changed one of the classes and the location, that’s very mild changes. So you can even do, if you do a statistical analysis the computer’s going to do that super, super fast.

If you look at all, of course you can look at all the entire body and see all the different properties they have and comparing what you saw earlier, the DOM application earlier, a second ago, and the application now. You can say that in 74% match. So that means this is the confidence that we think that this is the same element, even though they don’t look exactly the same, which is great. It means in this case we still had that step, that test passing. It says it’s good enough. Of course, you want to have your own threshold, but it means that even though you changed things, that what you think is usually are going to break your test, now those things will not break you test. Your test will not break and won’t be so brittle. Especially when you change thing in A/B testing, if you… And I hope that people are doing that. Sometimes you make some changes to an application, you don’t want to start rewriting their entire test just because you made a small change to the way things look.

But what I want to show is actually something else, is like what can you actually improve, what happens right now if you look at… Like computers, they can do even more because they can put in weights on things and say okay, this property is good or bad and actually give it a score. And remember we were talking about random generated IDs. So an ID is great, but if it changes all the time then it’s not, it doesn’t help us. But if you look over, the more you run your test you can learn, that could be a great help and that’s something that humans can’t do, they can’t look at every time you run a test and look at all millions of properties and make changes. Computers can do that very easily. So if you have like five stars here specifically for that element, if you do, let’s do a manual improve right now, just say this is the element again. This is the same element and so what you’ll see is that now the score is actually lower and that’s because if something changes in five minutes and you saw two different values, the score goes down. The confidence we have in that property not changing is lower. And of course it’s going to get lower and lower if it’s a random generated idea.

So that’s the ID on multi locators and I hope everyone can understood what I meant by this is something, the first level is to have multi locators and to have more stable tests. The second thing, what I started to show you, I showed the manual improve, but can you improve every time you run the test. If the test passes it means, okay, that’s great, it worked. Let’s look what we can learn out of that, and that’s what I think is also, that’s the next step [inaudible 00:23:48] happen and I’ll talk a bit on what I think is even the next step. Something that we’ll see this year, which is I call autonomous testing, but I don’t think it’s fully, fully, fully autonomous. Fully autonomous will take more years, where you give an application to an app and it will just test it for you and clicks randomly. I think those are, how do you know that those buttons should be aligned or not should be aligned or the text should be this font. If you don’t show it then it’s going to be harder for a computer. I think computers are not there to tell you whether it’s pretty are not better than you.

So I think until those three levels, like one, two and three, still would need some human help out. The third level is I think is connecting to production. I’ve seen so many times where someone was testing and they’re not aware of what’s going on in production. So I think that connecting to production gives you so many things. One, you can automatically generate the tests, and there’s no reason that you’ll think about, oh, well what’s the test. You need to think about the bad path not the happy paths. Happy paths you’ll have your users in production always running it. You should see what the users are doing, not only to generate the test but to understand coverage and I’m talking about user coverage, not just code coverage.

Code coverage is a way to know how many lines of code have been written, but all lines are equal and not all lines are equal because the login can happen millions of times a day and it’s very critical for your business, changing the profile image, it’s not the most critical. If that doesn’t work, if you have time to run five tests, I would recommend doing the login, the add to cart, the checkout and the change profile image, do it later. If you have time to run them both or authoring both you should start with those which matter to the business first. This is what testing is. Testing is making sure that the product features that you set up are as expected. And of course as expected means that it’s the best thing for the business.

So, what I think would be is that you can, if you have something that you can look about what’s going on in production, just like you have Google analytics and think about other analytics, whether that’s [inaudible 00:26:28] et cetera, you want to know what’s going on. What are the user flows that people are doing and you can create those tests automatically. The fact that you’re generating the test from viewing a lot of users actually means that it will be more stable because if they have a random generated ID, two different user might have different random IDs and the machines can learn automatically and say, no, no, that property ID I wouldn’t use that for this case. It’s different between different users, but the text is correct.

So that’s something that I think would be super, super helpful, and again, looking at all information and millions of users within a day or two and creating all the tests for you, that’s something AI can help and humans we suck at that. So I hope that’s more understood.

Yoz, by the way, any questions, or anyone out there? If you have any questions feel free not just to wait until the end, if you want to ask questions I’ll be happy to answer.

Yoz Grahame: Yep, I’m keeping an eye on things and we have, actually we have some questions. I mean I recommend carry on with the slides for the moment and then we can field all the questions together at the end. I find it works best.

Oren Rubin: Okay, perfect, perfect. So let’s talk about… Sorry, I was going backwards instead of forward. So visual validation again, so to those who haven’t been in the beginning, let’s take something very, very simple. I can actually show, can even show some examples of why a visual validation, why is it something worth doing. When you’re just… I’ll start with the drawbacks actually, it’s slower, you’re taking the screenshot, you need to pass millions of pixels every time and comparing them as opposed to just ask, oh what’s the text here.

So looking, if we’re looking here we can see a few flaws that usually won’t be found without visual validations. For example that the image is in the back and not in… You see the cowabunga there. The image is in the back, that’s something that’s going to be harder to find without visual validations. We take a screenshot, you can see how it’s different, that’s very easy. The size, I knew… I won’t mention the name of the company but it start with S and an A after that, there’s one more letter. But there was a bug that the entire application instead of being the width of a hundred, 1024, that was a few years back, that was the resolution back then, but the entire application rendered in 124, it’s like very small, but all the tests passed. If you find using the DOM, all the tests passed.

So it does, size does matter apparently after all and another one, this is the flying pony tail, I think that’s the Google bug. They had, I think they had it only for production, only for administrator [inaudible 00:29:57] flying pony going around and accidentally send it to everyone. Because when you’re looking for a text and saying hey, does it show cowabunga, yes it shows it. But the question it doesn’t answer, do you have anything else shown there. So those are kind of like the reasons for doing visual validations. The reason that people don’t do that a lot is not just because it’s slower, but because it was very unstable.

Why, because people did just pixel by pixel comparison and those tend to break a lot. If you look at this, I’m trying to go between those two images, so anti aliasing every display adaptor has something different. Sometimes some of them are actually [inaudible 00:30:44], so it’s funny but they are… And that means if you render on the same device, same machine, you didn’t change the app, you’ll just render it again and you’ll see different results. And then people start with, for the last 20 years, oh wait a second, can I put a threshold and not more than 10% change. And then those you can get false positives. That means that you can have, if you do that a plus can turn into a minus and that’s… because you’re using the visual validation also as functional. You want to make sure if you take a screen shot it says that if you do a calculated one plus one, you take a screen shot, it says that the number two is shown. It also says that it’s on the right in the same font. But it also does the functional testing.

So you will have, you can’t have it too loose. So there’s a lot of things that happen, this is this real, real examples that we’ve seen that actually fail. There’s a few others that cause fails, tests to fail. And I think that first level of course is very similar is this [inaudible 00:32:06] they can help you build. Instead of doing a lot of validations, take a screenshot once and that’s it. And of course if you have things which has the random date, add in ignore region for that specific. But I think what AI helps is actually an understanding what a human would say, would a human say that this change is mild or unnoticeable, or would it say no, this human would fail it. And that’s the nice thing that this is where we are, even right now. That there’s things that can help you with reducing those flakiness.

The next level is can you look across tests. Like if you’re moving your logo from the left corner to the right corner, will it fail one test. In some cases it will fail a thousand tests. So the question is, do you want it to fail a thousand tests but tell you, look, I’ve looked at it, the computer can look at it and says it’s one issue, it’s the same issue. The login, the logo thumbnail, it should be on the left, now it’s on the right. One issue, and the maintenance will not be, it won’t take you two days to go over that, it’ll take you five seconds to say, yeah, it’s by design. It’s not a bug, the designer decided it and approve that. And you’ll learn out of that.

So looking across, just like with… The same thing of course as I said earlier with finding elements. You can understand, over there you can understand that there’s a lot of tests failing but they all fail for the same place that an element wasn’t available or clicked. So the same thing here can be done in level two here. So those would be [inaudible 00:33:53].

I think that next level of course is more based on, can we take what’s going on in production and can generate the way we think that they should look. And by the way, when I say production, it doesn’t have to be always production. It could be some people playing around with it locally, because as a developer you won’t even pass it along before you even play with it yourself. So you can play with it yourself just a bit, but there could be something recording what you’re doing and creating an automated test out of that. And you can add those type of visual validations. You probably want to play around with different resolutions, et cetera. It can learn and help create the test for you. It can also update it and of course give you coverage of whether that scenario was checked or not.

So those are the two that we said we’ll focus on today. Finding elements and actually then validating. There’s a few others and I guess in every aspect not just end to end you need to ask, can we create the unit test automatically from production. I think I can help out in different areas about risk management. We talked about that but it wasn’t a big focus. I think the key focus that I want to infer today is that AI can and will improve more and more and more your tests, your authoring, how fast you author and how stable the tests are. It can help you validate better than ever and of course where we see it is the connection to production. If you are doing this right now, even writing the test manually, like coding them, [inaudible 00:35:45] whatever you’re using to write the correct test. And also then run it, use it as your own, also your monitor. And monitor your critical flows there in productions. That were my critical things that I recommend people doing.

Thank you everyone.

Yoz Grahame: Fantastic, thank you so much Oren. That was hugely informative, thank you.

So we’re going to take some questions from the audience now, we already have some. And if you’re watching on the switch stream and you have questions or comments about test automation, about anything that Oren mentioned here then please post in the chat. We’ll be taking them for the next 25 minutes or so.

So to start with, before I get to some of the questions that have already, we’ve already got a bunch of questions from the chat. But first of all, before I get to that I just want to ask, we probably have some people watching who are very interested in automating end to end tests to improve the quality of their apps. But they haven’t really set up much of anything yet. So where’s the best, or how is the best way to start to get the best bang for the buck?

Oren Rubin: Do you mean whether to start with end to end or with uni-test or integration test?

Yoz Grahame: Any of that, yeah. I mean let’s say you’ve got, I think this has happened to many of us who do web development, is that we have something that was a spike or side project, or something that we’re just trying to get as fast as possible and didn’t really bother with the tests. And now we’ve got something working that we have to maintain and going, oh God, I need test coverage on this.

Oren Rubin: So my recommendation is to go top bottom. That means start with the end to end tests so you’ll have biggest coverage as soon as possible because you want to have, if you don’t have automation and automation that you trust then that means you have to test everything manually and that’s… We all know that, people have bugs, so we all know that we need that safety net. And the more you add layer, the integrations, adjusting the front and not the back end, and again if you didn’t write unit test, then write the unit test and that would give you, when something fails you’ll know, okay, I know that this unit doesn’t work and this function failed. End to end wouldn’t give to you that right now, I think in the future it will but I think it’s, you need a high coverage as soon as possible and then you want to have the safety net as soon as possible and then you want to have the drill down and when something fails you’re making a small change, you want to know exactly what failed.

Yoz Grahame: Right, so end to end is the fastest way to get just broad coverage that your happy path works.

Oren Rubin: Yes.

Yoz Grahame: And you can do that I suppose as you were demonstrating with a test recorder, is one of the easiest ways to do it. Just kind of click through the way you normally use the app.

Oren Rubin: Yeah, I think what changed now is AI. I think test recorders back then, a couple years back were so bad. I was personally, everyone knew me as the person that hated that and said, never use that, always code your test and they now like, oh, Oren, you’ve changed. And now I think they’re stable, I think they’re more stable than what a human would do. So it doesn’t just save you time, they’re more stable.

Yoz Grahame: Yeah, I definitely remember 10 years ago using Selenium test recorder and [inaudible 00:39:49] editing [inaudible 00:39:51] that it did. And it took a lot of work, as you’re saying, this stuff can be useful but incredibly brittle. But you’re talking also that then once you’ve got some end to end coverage, you want to go deeper into unit tests and component tests and things like that. We have one of the, some of the questions we’ve been getting from the chat. Johnny five is alive, apparently a short circuit fan, is basically saying where do unit tests and component tests fit in here? So where do you think the value is there?

Oren Rubin: The value, I do think that in the future AI can help with even unit tests because if you look at a flow, right now we’re talking about from UI testing, it means you click on something and you’re validating the expectations later. But actually when you think about it, when you click on something there’s a function being called, another function or method inside, whether that’s the client or the server, there are being code with some values and you get a response later and in most cases you get, is a function that returns some value. So even if you look at that and could see a lot of examples created out of that, you can create maybe in the future we’ll see that unit test will be created automatically just by using your app once.

So I know there’s people who likes TDD and actually that means that you write a test first and then you implement based on that, that’s great. I’m not saying to not do that by the way, just making sure. Just saying if you didn’t, if you don’t, for whatever reason there is, there’s no reason that you wouldn’t have, wouldn’t like to have unit test later but of course it’s going to take you a long time to do it later if you don’t do it while you’re building it all the time, then it will be huge and it will take you months. So if AI can help you with generating those unit tests, I think that would be great.

Yoz Grahame: But would you say that it’s still worth, I mean let’s say… At the moment we don’t have AI for generating unit tests, what’s the, how would you say it fits in at the moment in terms of, let’s say you’ve got some end to end tests, is it still worth putting some effort into having component tests.

Oren Rubin: Yes, I really am a believer that you should have end to end. They’re more… Any unit test, they’re more complimentary, everyone helps in a different aspect, anyone can be used by others. For example your end to end text, maybe the manual testers can use that and help out writing more tests. Unit tests for example, I think that last year that would be probably develop for themselves writing and owning that component and ending all the tests for that specific component or I think you need both.

Yoz Grahame: Right, yeah, It’s certainly been useful for me when doing testing because unit tests and component tests tend to be much faster than end to end tests. And so if you’re able-

Oren Rubin: Oh, sorry. I think, I agree with you 100% and even go back just to say that, and I think that the unit test, you’d run them in different times. Every [inaudible 00:43:29] you can run it takes seconds, so let’s run it. And the end to end test, because they’re slow, something is much slower is something you’re going to run it in a later stage.

Yoz Grahame: Yeah. And that’s something I find incredibly useful is getting validation as fast as possible. You really want that, as they call it the inner loop of development to be as fast and tight as possible. And also the unit tests make for good documentation on themselves. They’re good for describing how, what a function is actually meant to be doing.

Oren Rubin: Example, but you can still do it… I just want to put one note there for everyone, when you writes unit tests, still use it as a black box. Don’t look inside of the inside implementation, if there’s a function called add and gets two integers and returns another one, don’t check that there’s an internal state in the application somehow, because you can change the implementation but the units should still do the same thing.

Yoz Grahame: Yeah, you’re testing the interface, the external interface, external behavior of the component.

Oren Rubin: Yeah.

Yoz Grahame: That’s a really good point.

Oren Rubin: Can I give one more point?

Yoz Grahame: Yes, please.

Oren Rubin: And also even API testing, if you have a component, let’s just say microservices, the fact that one service has tests for their own, just say a service that has their API tests, other components using [inaudible 00:45:02] services using that component or that service, they should have their own test of what do they think, what is the contract between them, what did they assume. So the fact that the even the service has their own unit test, or their own test, the ones using that can say, no, no, no, but this is [inaudible 00:45:22] I think it is and then when they change, when someone changes a service and their test as well because it goes together, it’s the same place, you change that, you want to run the test of people that are dependent on you and how they see the contract, and that that contract is still fulfilled. S

Yoz Grahame: So would this be by using mocks and spies. So you mock in something to act as the component being used and you want to make sure that it’s receiving the method calls that it should be receiving.

Oren Rubin: I’ll give an example which I just saw a friend use and I like the example. He was giving a friend from [inaudible 00:46:07], he was showing an example on [inaudible 00:46:08] and told, okay, there’s a third party library that you’re using. I don’t know, I think he used something for dates. And then he said, okay, me, when I’m running my test let’s write an adaptive first of all, that’s the interface between my application and the date, whether I’m using [inaudible 00:46:30] JS or another library there’s API that I want to work with. And when I have that API that I want to work with, I can test my tests, my own tests with that… It could be mocked, I don’t care because at that point this is the API, I can mock that.

But on the other hand I can run additional tests which takes the library by itself and add and write tests for it. So that means that even if [inaudible 00:47:02] JS changes something that breaks, obviously when they change something they’ll change their test and it’ll work great, but now if you, you’ve updated and you have the new version of [inaudible 00:47:13] JS, but it doesn’t find, it breaks your contract how do you use it, then you want to know about it and of course you want to have your own test that breaks because of that.

Yoz Grahame: Right, right. So, yeah, that’s very useful especially for catching issues with third party dependencies before they bite you. Many of us have been in the situation of casually updating a third party library because there’s a new release and suddenly stuff gets broken and you don’t realize until you deploy.

Oren Rubin: Exactly. And especially in unit tests, you wouldn’t catch that because when you’re doing unit tests you’re not testing the [inaudible 00:47:56], you’re not checking the external library. When you do end to end testing you will catch that. So either you, so the question is do you want to have two different types of unit tests, one that checks the external library, one that checks you and the contract between, and would you want to have an end to end. My answer, I think you should have a bit of both.

Yoz Grahame: Right. So we had a couple of questions about the semantics of testing. So for example, Heidi was talking about when you were demonstrating how you can get the right element using AI to do a statistical analysis effectively on the different components, she said it’s like you’re testing with the gestalt rather than the specifics of the element. And Heidi, please correct me if I’m wrong on this one, so effectively what the component is actually for or what the, you’re trying to find a way to describe the semantic meaning of the component that you’re looking for. I mean the way that AI is doing it [inaudible 00:49:09], but I suppose this really comes in when you’re talking about like TDD. In terms of how do you, and maybe it’s two different things, please tell me, but how do you locate something that doesn’t exist yet when you’re writing a test.

Oren Rubin: So let me example. There’s TDD and BDD, so let’s talk about first of all the differences, a bit of the differences and even show.

Yoz Grahame: Yes, please.

Oren Rubin: First of all, it’s something that you might want to want is one level in [inaudible 00:49:43] layer, business level. Clicking on stuff and user interactions, those are the implementation level. You do want to have both layers in a test. This is not a good test, a good test is if I select those steps here and let’s group them all together and say this is the login. And select the other steps and again I’ll do, I can like extract the function and say this is search. So this is the business level, this is where we’re doing login and the search. And we should focus of course, a test should be written, and we didn’t focus on that too much today about the test should be different levels.

So if you’re familiar with things like Cucumbers that actually forces you to work like that, first of all to write the high level implement, and then of course inside what is the implementation inside of it. How do you implement the login. And there’s another thing which is something that also [inaudible 00:50:46] forces you as well, [inaudible 00:50:49] focus more, it’s a design [inaudible 00:50:51] that focus more on [inaudible 00:50:53]. Here we’re using delegations, kinetic functions. If you look at it as code you’ll see that if something is just a function. So it’s just a function, if there was a function called search, those are the business levels and there’s the implementation inside of it of how do you [crosstalk 00:51:10].

So this is kind of what I’m talking about BDD and also you can write the login before you can decide, oh, sorry. I wanted this thing and say checkout.

Yoz Grahame: Could you zoom that window, because we’re realizing it’s going to be slightly too high res for people to see very well.

Oren Rubin: Oh, let me try to make it a bit bigger. Oh, sorry. I’ll try [inaudible 00:51:38]. Okay, zoom in here. I’m going to create [inaudible 00:51:41] check out. And I want to zoom in like this so you can see that.

Yoz Grahame: Brilliant, thank you.

Oren Rubin: [crosstalk 00:51:48] functionality working, when you write a test you can work either from the [inaudible 00:51:54] business level and to the test, implementation or backwards. As I said, recording is actually, ideally you want to create those steps before go in, whether you record it or code it, this is where of course that you want to go inside the checkout and create that. So I ideally I suggest working first of all with the business level, try to understand that from your analytics, and hopefully AI will help with that and can generate those skeletons automatically for you.

But then the implementation should be inside those. I think that one other things was like you were mentioning, and then [inaudible 00:52:35] like how do I click, clicking on something, how do I give my [inaudible 00:52:40] I want to do something where I don’t have the implementation yet. So I think that what we’ll see more and more is that you’ll give one property away. Just like what you’re doing with code right now, you’ll just say, oh, I want this to be, I want to click on something that has an ID X, but it doesn’t mean that after the test is running, after it works, that you can’t improve it and then start using other properties. Because the ID, as I said, someone’s going to change that. So you might want to rely on it for the first time, but then after that you won’t.

Another example will be, and I think hopefully by the end of the year we’ll see more example like this. You can also look at the mocks. Imagine you have a mock, you can find an element based on the way it looks. Like literally the pixels, it doesn’t have to be real ones, could be an image. Imagine that you have an image and you say find an element based on how it looks. This is very flaky, that means that this is going to change so fast people would do a test, change the colors and the text and that would break. But I think, what I think that it’s enough if it will run ones, if it will pass once then you can learn about all the different properties, not just the way it looks externally, the pixels. Look at all the different attributes and learn and it will not fail again.

Yoz Grahame: Yeah.

Oren Rubin: I hope I make myself clear.

Yoz Grahame: Oh yeah, it does, and actually that fits very well because one of the things I learnt with TDD was red green refactor. So, when you get the green that doesn’t mean that you’re done. You get the green when you’ve demonstrated that the initial implementation works. And then you can improve things and at that point you can, now that you have something that works you can swap in AI detection for explicit selectors and take over from there. And it means that your test is now more robust.

The great thing about all of this is that if you, loads of us who work in dealing with test automation are so used to the majority of breakage or a huge amount of breakage being effectively deliberate, right, because something was redesigned and it’s… Or something was redesigned and, or something was just moved to a different point on the page or the ID was changed or whatever. And so being able to remove a huge amount of those false positives and make the test more robust is incredibly valuable. So talking of which in terms of changing things, we had a couple of questions, actually both Johnny five band and Heidi were talking about div tags.

Now div tags still exist. They’ve supposedly, as we’re writing more semantic HTML we should be using div tags less, but that doesn’t seem to be happening. Ideally we should be using web components or something where you have the HTML, when you read it it has semantic meaning. If you, is this for AI, well certainly for your kind of, your implementation of AI browser testing, is it able to handle, firstly is it able to handle those kinds of changes. So if you switch from a div tag to say web component or something, a customer element to describe a component, is that going to be able to handle that.

Oren Rubin: Yeah. Not only… I even want to show, maybe I’ll show something, I’ll take this login sample here. By the way I had once when Wix, originally when I worked there like a decade ago, we used our [inaudible 00:56:37] that we built in house and but obviously they threw away my code and started using react a couple of years ago. And we had a test that ran on a test with this before the change and after the change. The DOM was different but still the test passed because you can still, if you have several properties that actually, it says in high probably this is the same element. For example, if it looked different, a DOM instead of a button, it’s div or vice versa, it’s still going to have the value for example book.

Yoz Grahame: Right.

Oren Rubin: So it still can find it. So I think AI actually adds a lot of things in order how to find. Things that you don’t expect. Things that we noticed was that for example in this case, when you click on this book, I saw that all the book, actually is exactly the same, all the book buttons. It’s a reusable component. But what I saw was that in that case, so they all have the same properties as everyone else, but I saw that the text actually… How do you know that this is… This is unique to this component but how do you know that to use this component. What happen if you change the [inaudible 00:57:59] with the [inaudible 00:57:59]. What happens if you switch them. What should you click, this one or that one.

So the question is, it depends on the way you look at it and of course automatically for example it would suggest, okay, there’s several properties. Either it’s the second, you can click the second one or we can click the one that has Tongli next to it. But you see that more stable, they will see that the text is actually more meaningful [inaudible 00:58:23] the second.

Yoz Grahame: Oh, interesting.

Oren Rubin: Obviously those things you want to have it so you can edit it and say, no, I don’t want you to use this or I want to use always the Nth, I don’t know, you can choose a variable, user, I don’t know, something, go inside of here.

Yoz Grahame: Right.

Oren Rubin: But you can change all that but by default you want the computer to be actually smarter than you and actually suggesting that you’ve never thought of.

Yoz Grahame: Ideally, yeah. So is there, I mean the second point with that is that one of the problems with divs for example is that, and losing semantic meaning is that it’s painful for accessibility. In that you lose a lot of the meaning that assistive browsers and devices are able to pick up on. Are there tools, I mean I’m not sure if testing doe this, but how would you recommend for, what would you recommend for insuring maintaining the accessibility.

Oren Rubin: Can you say that again. I’m sorry, everyone’s working from home including me and there’s seven week old baby crying here, but her mom’s taking care of her. [crosstalk 00:59:30].

Yoz Grahame: Yeah, it sounds like we’re interrupting her meeting. Yeah, so I don’t want to interrupt her meeting, it sounds like she’s complaining about it. So talking about accessibility in particular, are there ways that you can verify that, whatever the components you have, when you’re changing the implementation, that they’re still maintaining or improving the accessibility by insuring that the semantic meaning of the component is being passed along to assistive devices or assistive browsers that are able to use it. Do you know, and like ways of doing that or ways of just testing for accessibility generally.

Oren Rubin: Yeah, I think testing accessibility was probably a bit harder than just looking at the pixels, because looking at the pixels, you say I don’t care if the tag is div or span or button, I care about [inaudible 01:00:34] semantic. Checking the semantic more, you can validate, I think when you do a validation, obviously you can validate that something has specific property, but also things that you can do but I don’t recommend is actually saying, you know what, I want to have the threshold to be super high. That means I want to have 100% or 90% of the properties to be aligned. Maybe in the future you could have just more focus on that, like you must. Like the tag button, or you want to have some kind of, as you said for the accessibility, to have more properties there or special accessibility validations that would be added just like visual validations. [inaudible 01:01:22] validations and you’ll just add those or [inaudible 01:01:25] would be added automatically after reporting. So I think we can be there, I don’t think that we are there right now.

Yoz Grahame: Yeah. It’s definitely, that would be a really interesting thing to work on in the future. You know, making sure that not just… If you can derive the semantics of a component from its behavior and then validate that, that those semantics are being effectively communicated in the HTML or in however it’s implemented, that I can imagine would also be very valuable.

Oren Rubin: I agree.

Yoz Grahame: So we’ve got to wrap it up in just a couple of minutes. I was wondering, the couple more questions for you before we wrap up. The first one, are there kind of techniques or technologies that you see web developers and especially test automation people, well actually no, let’s focus on web developers generally. Techniques or technologies that you really wish they would use more. Other than obviously AI driven test automation, we could say what else do you think would give them value if they used it more or were more aware of it.

Oren Rubin: Techniques and technologies, so techniques always have one higher level of obstruction. At least one level, you can go in and you can go inside, but you should always separate between the business logic and implementation. I’ve said that, I’ll repeat that because I think it’s critical. And that’s of course, the thing that this helps is of course is re usability, because this can be copy paste to another test and of course passing different parameters. But I think it’s critical.

Yoz Grahame: Yeah, there’s many layers of doing that throughout [crosstalk 01:03:26].

Oren Rubin: Yes. And technology wise, I think we’re there. Like if you’re talking about front end testing, web developers, front end testers can write more unit tests, right now as opposed to from what I see, and again I met only a few thousand I think over the last few years, is that in the backend everyone’s doing unit tests, everyone’s doing unit tests. On the front end almost nobody does unit tests.

Yoz Grahame: Ah.

Oren Rubin: And there’s no reason, there’s no reason. There’s like right now I’m going to [inaudible 01:04:00], like everyone has beautiful building frameworks to write unit tests but they’re not doing that. I suspect it’s more educational than technology maybe, because if [inaudible 01:04:12] is available, but I think this is something that I recommend people doing that. You don’t have to be religious, you don’t have to do TDD, but I think it will help you write more tests, the tests will help you grade it more, [inaudible 01:04:28] their code will be re usability.

Yoz Grahame: Yeah. I mean especially given of the past couple of years, design systems have become so much more popular, especially things like Storybook. And Storybook has fantastic built in stuff for running unit tests or components.

Oren Rubin: Visual validation as well.

Yoz Grahame: Yeah. It would be, it fits right into the existing component flow. So, unfortunately we’re going to wrap it up there. I saw that we’ve got another question come in that maybe if you want to handle it in the chat later we can discuss it later, which is to do with Cucumber and the ups and downs of using Cucumber. Actually, I’ll tell you what, do you want to do one minute on what you like, actually what you like about Cucumber and BDD?

Oren Rubin: What I like about Cucumber is it forces you, you can’t write a test without creating the higher level. They said a test starts with the spec. The whole idea, I guess something that nobody, I don’t know that nobody, but most people don’t know, Cucumber wasn’t designed just for test automation. It was designed as a tool for actually for product people, so all the [crosstalk 01:05:52]. Everyone can look at the spec level and they wanted to be tied of course to your code, but [inaudible 01:06:00] in the real implementation, this is [inaudible 01:06:02] accommodation and actually this is the spec. Not just documentation that, you can change documentation and it doesn’t relate to the code. We all had it and you had beautiful documentation but then you change the code and the documentation is not updated.

This kind of forces you to work where you… It’s documentation but you can’t change that, it’s rigid in the fact that if you change it the test won’t run. So you have to be, you have to change it and it forces you to separate those between the business logic and implementation. That’s what I love about it.

I do believe, there’s two things that they don’t force you but there’s the given when, again this is depended on whether, how [inaudible 01:06:46] religious are you. It doesn’t have to be that way, whether you can call it login, doesn’t mean that you have to call it given when I login. Those are [inaudible 01:06:53]. But people should take, just like religion, they should take it as they want and not be forced to be in a specific way and actually take it gradually more and more and get some of the values.

Yoz Grahame: Brilliant. Thank you. So we’re going to just wrap up and say thank you Oren for this, this has been fantastically helpful. Oren is the CEO and founder of Testim.io, which he has been using and I will switch to that view again, during this to demo. Since we got several questions asking what tool you’re using. I noticed that especially one thing that was incredibly useful is the new free playground for Playwright and Puppeteer users. So, if you want to show that off which is something that everybody can use for free I believe.

Oren Rubin: [inaudible 01:07:50] this one, play [inaudible 01:07:54] actual product. There’s two released play… We released… first of all this product I demoed here with the recorder [inaudible 01:08:03] Testim, that’s also, we released that as, [inaudible 01:08:07] and also what we released this week was you can record, you can go in and record a scenario and export that directly, automatically, just move it a bit so you can see, and you can record the steps and get them at different-

Yoz Grahame: That’s great.

Oren Rubin: … And you just copy paste, or just click it directly when someone wants to… We want to do more like… I want to show everything, I’ll send the URLs also for Puppeteer. Selenium is coming up next, want you to show and play around with all the different things and see the difference between the different frameworks. Those are more of the, what I call the underlying infrastructure frameworks.

Yoz Grahame: Right.

Oren Rubin: So this is a completely free tool that we released.

Yoz Grahame: That’s brilliant. Okay, thank you so much for joining us Oren. And for people who are just tuning in or may have missed part of this. We will be putting the recording and the transcription online in the next couple of weeks. And we will also be back next week and I can’t remember, is it Sleuth, we’ll be talking to somebody from Sleuth next week. So thank you very much Oren for joining us. Thank you everybody who tuned in and see you all next week.

Oren Rubin: It was a pleasure being here, thank you everyone. See you next week.

Yoz Grahame: Thank you, bye-bye.

Matt DeLaney
Matt is the Content Marketing Manager at LaunchDarkly. He's written about emerging technologies for various SaaS companies in the past. Matt also used to work in sales at Tableau. He holds a BA in History from Cal Poly, San Luis Obispo.