No results found for your search
Ask GH: When an A/B test fails to improve results, is it important to understand why?
I personally don't spend my time trying to figure out why an A/B test didn't improve a result. Curious if others do and if so, why they think it's important.
It's critical to analyze any result - both positive and negative - since testing is mostly about learning. Uplift is a nice side effect.
If you learn - you're able to run better tests down the line.
If the test didn't win - why was that? Was the hypothesis wrong? Or was the implementation perhaps lacking? Of course you can't know for sure, but you can theorize about it, and come up with the next iteration.
When a test fails, the correct move is to try an iterated test - instead of moving on to try smth completely different.
Also - tests that fail to provide an uplift usually fail "on average" - but there are segments where a failed test actually won (e.g. mobile traffic from Facebook), but that win was cancled out by a loss in another segment. So each test result needs to be segmented heavily for learning reasons.
I've written a lengthy post on how I used A/B testing results for learning, and how that in turn lead to bigger wins:
Thanks Peep, that makes a lot of sense. So figuring out the exact reason might be tough, but at least refine your hypothesis based on the lack of overall lift.
But "this doesn't move the needle" is a perfectly normal outcome for an A/B test. Which is also sometimes "this doesn't move the needle today" or "this doesn't move the needle for the specific traffic I fed it."
This is true of the whole experimentalist approach, really. Not everything you try is going to make a difference, it's the pattern of testing that pays off. You never really know which ideas are going to be worth it unless you put your ideas into action and test them.
If you only test once in a blue moon, and say 10% of those pay off, you're going to be way behind the person that tests every week - even if only 10% of the tests they launch pay off. Numbers made up - ratio will never be THAT high, if it does you're fooling yourself with data, but it DOES depend strongly on how well you choose your tests. There is definitely still a place for ambition and design thinking.
Important to understand why? If you're running a button test or something, and nothing happens, I wouldn't waste much time on it. If you tried something more drastic, or something that you really expected to make a difference in some direction - you can always follow up on it, yeah. Talk to individuals, set up more tests. You can learn stuff in one conversation that you would never think to ask or test.
Also, seriously, sample size will bite you. Traffic characteristics will bite you - why people are visiting, whether or not they have an initially high likelihood of doing whatever action you're studying. They also like to team up - say you shoot 20,000 hits at each version of a landing page, but only 2% of those were highly validated traffic. What you've really done is show each page version to ~200 possible customers while mildly amusing 19,600 random passers-by.
It depends. Like @grayj suggested, if you testing something as simple as the colour of the button, it may not warrant analysis. But on the other hand position of the CTA button relative to the pitch, might be worth pondering over.
Similarly, trying to understand why certain CTAs are more successful than others might help understand the behaviour of potential customers.
But more than any one Aha moment, it's the small incremental benefits that bring out the cumulative results.
"What's worth pondering over" is really a huge part of what you focus on. Sometimes a test is just a test. Other times you say, "well that's weird, I didn't expect THAT to happen!" That's when the magic happens.
No discernible result when you really expected SOME result is still data. Maybe it means your tracking is screwed up. Maybe it means your assumptions need revising. Either way, it's information you can build on.
If a test fails there is something to learn. I may not spend time to understand the cause but I try to glean anything I can from the outcome. Finding out something from a failed test is important even if not to scale or make a change. Am I wasting my time?
Yes, if the test is relevant to a hypothesis about positioning, targeting, and messaging. Without learning why those tests fail, it will be more difficult to know when to pivot/preserve them.
In theory yes. In practice I wouldn't spend much time wondering why.
The reality is that most startups only have enough time/money/manpower to run a limited number of tests each week.
I've never worked at a company that DIDN'T have an ever-expanding list of future tests to run. If you spent all that effort on understanding why a test didn't work and then trying it again, with a slight variation, you'd never get anywhere.
Unless you have a strong reason to believe that re-running a test will yield different results, it's probably better to focus on running the next test on your list.
That said, there are plenty of situations where re-running a test is warranted. Trying different traffic sources or customer segments is one. If you've recently made a significant change to your product, then yes, re-running a test might conceivably yield different results.
But I wouldn't spend months testing different button colours. Google can afford to spend time/money/manpower on testing 41 shades of blue; startups need to chase bigger wins.
If you have a well-formed hypothesis and creative that aligns with that hypothesis, then it's virtually impossible NOT to understand why a test failed... it's because your hypothesis was not valid. The key to successful testing is isolating your creative changes to a single variable (unless you're running a multivariate test)... and ideally to a single measure of success. With more than 1,000 tests under my belt, I've tried to shortcut the "path to perfect learning" (from tests), and I've learned there are no shortcuts... only well executed tests.
I'd say that it's equally important to think a bit more deeply about the tests that succeed as well as the tests that fail. Are there alternate hypothesis that could explain the results — either way.
For example, one client was trying some alternate headline copy. Some were working significantly better. Others… not so much.
After looking at the copy in context we saw another pattern unrelated to the copy's message. Some of the headlines were wrapping to two lines. Others were not. The ones that were wrapping to two lines were moving the rest of the copy down so that crossed the fold at some common screen sizes and made it more obvious that you needed to scroll.
Changed the layout so that always happened and suddenly all of the copy performed "well". What we'd discovered was a visual design issue — not a copywriting issue. And that discovery helped us make other improvements elsewhere in the site.
For me the point of A/B testing is to learn. And that learning comes no matter what the result of the test is.
Use the feedback box below if you have a question, comment or general feedback.
Your feedback has been sent.
Sweet! The link has been copied to your clip boardy board!
Flash isn't supported. Please copy the link manually.