Article Details Log In to Edit

A/B testing is fun. With so many easy-to-use tools around, anyone can (and should) do it. However, there’s actually more to it than just setting up a test. Tons of companies are wasting their time and money by making these 11 mistakes.

Here are the top mistakes the author sees again and again. Are you guilty of making these mistakes?

  1. A/B tests are called early

  2. Tests are not run for full weeks

  3. A/B split testing is done even when they don't even have traffic (or conversions)

  4. Tests are not based on a hypothesis

  5. Test data is not sent to Google Analytics

  6. Precious time and traffic are wasted on stupid tests

  7. They give up after the first test fails

  8. They don't understand false positives

  9. They are running multiple tests at the same time with overlapping traffic

  10. They're ignoring small gains

  11. They're not running tests at all times

Read the article to get full context on what each of these mistakes means.

Know someone who should weigh in? Share this discussion via Email, Twitter, LinkedIn, Facebook.

    • Earl Lear (@Learjet) Link

      This is a great article, I was wondering at what point Peep was going to say that 'statistical significance' was met and was a bit surprised to see 250 actions. I've thought that 200 actions was the target number for a long time, I heard it way back listening to a live conference call with the legend Gary Halbert (RIP Gary).

      His site is still one of the 'most valuable sites on the internet'! :-)

        • Peep Laja (@peeplaja) Link

          250 conversions per variation is not a number set in stone, it's not a law of the universe - but rather a helpful ballpark. 200 is also good. Sites that get very little traffic could settle on 100 conversions per variations if statistical significance of 95% has been achieved by that time.

          The more total conversions you have, the bigger the sample size, the higher is the chance that the result is accurate.

            • Earl Lear (@Learjet) Link

              Thanks Peep! :-)

                • James Martinez (@propjames) Link

                  Note the "if statistical significance of 95% has been achieved by that time." This is very important and could be explained pretty well if I could attach images, but I'll try to explain it anyway.

                  Imagine two graphs of conversion rates over time where the variations on graph 1 are very far apart and have been apart since the beginning, vs graph 2 where the variations are close together and have sometimes overlapped. The statistical significance of graph 1 will be much higher than the statistical significance of graph 2.

                  Hope this helps. :)

    • Shana Carp (@shanac) Link

      Actually, the way 1 is presented is an issue - never check in the middle of a test if you are going to use a frequentist model - precalculate the sample sizes you need first! Otherwise you need more samples.

        • Morgan Brown (@morgan) Link

          Was just going to weigh in on that Shana. I always go to this article on A/B testing when talking about significance vs. sample size.

          The key quote is:

          If you run experiments: the best way to avoid repeated significance testing errors is to not test significance repeatedly. Decide on a sample size in advance and wait until the experiment is over before you start believing the “chance of beating original” figures that the A/B testing software gives you. “Peeking” at the data is OK as long as you can restrain yourself from stopping an experiment before it has run its course. I know this goes against something in human nature, so perhaps the best advice is: no peeking!

            • Peep Laja (@peeplaja) Link

              The challenge with this is that you don't know what's the right sample size since there is NO WAY to tell what the conversion rates are going to be.

              If you go for a sample size that's needed for a 50% lift, and actually version B is 5% better - in which case a FAR LARGER sample size is needed to achieve significance. So if you call the test based on "the samples size number has been met", it's like a coin toss - 50% chance that your "winner" is actually a loser.

                • Morgan Brown (@morgan) Link

                  That's a good point. I think what Sean says on the topic is pretty valuable, roughly:

                  When you're optimizing for growth, you're looking for opportunity, not for statistical precision.

                  What he meant of course is you have testing cost tradeoffs that come with trying to get it exactly perfect which may outweigh downsides to moving forward.

    • Morgan Brown (@morgan) Link

      I've made a bunch of these over time. I love the "not running tests at all times" mistake. Continuous improvement is the name of the game.

      My favorite quote from this is:

      As an optimizer, your job is to figure out the truth. You have to put your ego aside. It’s very human to get attached to your hypothesis or design treatment, and it can hurt when your best hypotheses end up not being significantly different.

      This is so true. I've spent countless hours on new landing pages that look better, have better messaging, better visual hierarchy, and utterly fail to beat the incumbent. Just have to keep going and be able to put aside your subjective views and move on to the next test.

    • Miron Lulic (@Miron) Link

      What A/B testing tools do you guys use?