Leave a comment
Kickstart your growth at GrowthHackers Conference 2018. Purchase tickets now »
Get the GH Bookmarklet

Ask GH

  • SC

    Shana Carp

    about 4 years ago #

    Hi, So I'm the Shana mentioned below

    Short answer: The best form of testing is the one you do. If you have the facilities to built out a good version of Bandit and the people who understand how it works (since the statistical paradigm it exist in is not the one that a/b/mvt uses at all), it will probably be more successful for your site due to the fact that it doesn't allow wasted time showing the crappy variation, plus you can stop it whenever works for your and your business (past some real minimum). However, if for some reason you can't run MAB, this should NOT stop you from testing

    Long Answer:

    MABs, particuarly the UCB group of multi-armed bandits (http://lane.compbio.cmu.edu/courses/slides_ucb.pdf ) have this interesting property known as minimizing regret. Basically, as the difference in the clickthrough rates start widening, the one that is worse is shown less and less (logarithmically less), and eventually approaches 0. You can therefore exploit your winner as it is winning during the test itself.

    MAB is more widely used than discussed. If you read between the lines, companies like Upworthy and Buzzfeed are using it drive virality. I can personally testify that at least one famous growth hacker (not me) is using them. It works extremely well for things that are smaller (like buttons, or one form field), and when you want results quickly (eg Basically Upworthy is performing MAB in effectively real time)

    MAB as well a straight Bayesian a/b test also has the benefit that it scales up and down - all that you need to do is decide how much difference between your variations that you can accept. You can accept narrower set of differences (eg: .1 vs .01)if you are smaller and if it would take too long to have more of a credible interval covered. You also can check in the middle of a test for the same reason - something you can't do at all with any of the mainstream a/b/mvt test suites out there. (currently, talk to me in january) Basically, all bayesian methods, but particularly UCB/bandit is very sensitive to the realities of business since they don't force you to have a true "test period" that needs to conform to certain rules before they it work.

    (this post explains why in detail - http://www.chrisstucchio.com/blog/2013/bayesian_analysis_conversion_rates.html )

    However:
    MAB is much much more computationally difficult to pull off unless you know what you are doing. The functional cost of doing it is basically the cost of three engineers - a data scientist, one normal guy to put into code and scale the code of what the data scientist says, and one dev ops person. (Though the last two could probably play double on your team) It is really rare to find data scientists who program extremely well.

    There aren't good MAB tools for the masses (yet). For example - none of the big mainstream transactional email providers (including some famous for their data and tools involving data) provide MAB.

    Bayesian statistics isn't usually taught (and definitely not usually taught in business school). As a result, convincing marketers to use any bayesian test, let alone MAB is difficult because they don't understand it. Further, there are also no tests to experience it. (I suspect if the toolset changed and was heavily evangelized this would also change)

    PS: Chris is a friend, and I know factually he currently is in London.

    • EG

      Edward Gotham

      about 4 years ago #

      Hi Shana,

      Thank you for your detailed reply. From my understanding MAB is the best testing to use followed by bayesian testing. This is due to the optimisation that occurs whilst the test is running. Is this the sort of testing that companies such as highconversion do?

      What is your opinion with regards to optimisation over continuous time. Do you ever encounter problems where the results from a test over a particular discrete time period are only optimal in that specific time period and if used over a different discrete time period the results are suboptimal? Do you suggest always running a small control group to make sure your test results are correct over continuous time?

      Definitely let me know about the tool you are releasing in Jan. At the moment what tools would you suggest using? Also I'm very keen to meet up with chris to learn more about baynesian. Is there any chance you could put us in touch? Thanks again!

      Ed

      • SC

        Shana Carp

        about 4 years ago #

        Let's take this piece by piece

        >Thank you for your detailed reply. From my understanding MAB is the best testing to use followed by bayesian testing.

        MAB is a kind of bayesian test. There are others. There are also subvariations of MAB.

        >This is due to the optimisation that occurs whilst the test is running.

        Aka: minimizing regret, which is how mathematicians refer to this optimization. I believe I discusses this above.

        >Is this the sort of testing that companies such as highconversion do?

        Unclear to me, as I have not used their software. I did read their optimization patent late last night - it doesn't go into enough detail. Their marketing material does indicate they are using a black box, but there are many bayesian algorithms that could go into a black box.

        >What is your opinion with regards to optimisation over continuous time. Do you ever encounter problems where the results from a test over a particular discrete time period are only optimal in that specific time period and if used over a different discrete time period the results are suboptimal?

        Yes.

        http://www.chrisstucchio.com/blog/2013/time_varying_conversion_rates.html

        design changes would be solved similarly except without jacobi diffusion. You'd need a different underlying model because design has more discrete end periods (there are differences between say art deco and art nouveau)

        >Do you suggest always running a small control group to make sure your test results are correct over continuous time?

        No. Though if I am running more standard a/b tests, I will re-run experiments on a regularly by establishing that experiment one only covers some discrete time period, and then the copy another, etc.

        >Definitely let me know about the tool you are releasing in Jan.

        ok, shoot me an email at shana dot carp at gmail :) (smiley face is not part of my email)

        >At the moment what tools would you suggest using?
        That's a consulting gig in and of itself. Not every tool is right for everyone or for every project. I'm not sure what to say without a long talk to you, and I don't want to give you a tool that is wrong for you either.

        >Also I’m very keen to meet up with chris to learn more about baynesian. Is there any chance you could put us in touch? Thanks again!

        See the email address above :)

  • SE

    Sean Ellis

    about 4 years ago #

    Shana Carp can really answer this one much better than I can, but I find the promise of multi-armed bandit MVT very appealing. The more you can concentrate traffic on the likely winner, the less "expensive" testing becomes (in terms of missed potential). Hopefully Shana will chime in (I'll ping her on Twitter).

Join over 70,000 growth pros from companies like Uber, Pinterest & Twitter

Get Weekly Top Posts
High five! You’re in.
SHARE
13
13