Leave a comment
Get the GH Bookmarklet

Ask GH

I had run an AB test where I split my users into two groups and showed each group a different mix of interstitial & banner ads in my app. Since each user sees many ad impressions in one day, I'm using revenue per user as the metric to test. When looking for significance, I'm using number of users as my audience and the revenue as the results. How would you measure significance using these metrics?

  • JA

    Justin Adelson

    10 months ago #

    I think we need to learn more about your revenue generation model and what your hypothesis is for this test. Is this test to determine which ads generate the most clicks/revenue or which group of users drives the most revenue. For example, if you are driving revenue every time someone clicks on an ad then I would focus on impressions and link clicks (i.e. click through rate). If your revenue model is CPM based then you want to get as many eyes on the ads as possible, which means you want to focus on app usage and return rates (i.e. how many users in a specific group use the app on a regular basis).

    When it comes to measuring data significance, I generally use the number of users divided by the conversion quantity:
    Landing Page Views/Leads
    Impressions/Link Clicks

    I use Neil Patel's A/B Significants Calculator for determining winners: https://neilpatel.com/ab-testing-calculator/.

    All that being said, I am not completely sold that revenue/user is the correct metric to be determining significance - at least without knowing more information. I would love to hear what other people think.

    • NM

      Noah Manion

      9 months ago #

      Sure, we're measuring the revenue generated from banner & native ad impressions for users that use our app. What we're looking at is the difference between two network mixes (we run our own ad mediation system), so we'll set up two different network mixes:
      50% network a, 25% network B, 25% network c
      0% network a, 50% network b, 50% network c

      We're not able to accurately measure what revenue a single, specific impression generates. But we are able to back into that using the following data: total revenue generated by a network and total impressions served to a network by a user. If we sum up all of the impressions generated by all users for a network and then divide the total revenue from that network, we can extrapolate the value of a single impression. Since we're able to assign a "value" to each impression, and we know how many impressions each users has generated, we can figure out a rough idea of the revenue generated by each user.
      What we're looking to do is take the sum of revenue for users in each group and figure out which is the better mix. I can tell which group generated the most revenue and which group generated the most revenue per user, but what I can't seem to figure out is how to gauge statistical significance since every tool I see to measure it can't return a p value when the conversion rate is over 1 (and the average user has generted more than $1 in revenue. Say group A contained 50,000 users and genreated $150,000 and group B contains 60,000 users that generated$ 175,000 in revenue. I know Group A averages $3/user and Group B averages $2.91/user, I know group A is more, but how would I tell if my results are significant? Is this even the right calculation to use?

      • JA

        Justin Adelson

        8 months ago #

        I've been taught to use statistical significance for specific conversions or actions (e.g. clicks, leads, purchases, etc.). This is different from your situation because you are trying to determine which mix of advertising networks drive the highest value per user.

        I am going to preface what I am about to say that I do not have a fact-driven answer. That being said, if your revenue stream is linear (i.e. one straight line or trend), I would say that Group A is the statistical winner moving forward. If your revenue stream is exponential (i.e. your revenue stream does not increase on a consistent growth rate), then it is harder for you to determine if one group is going to make you more revenue in the future (thus the purpose of this question).

        I suppose one option you can implement is to determine a specific number of high-value users per group (i.e. user value is worth < X) and use that as a quantitative metric. For example, if group A has 10,000 high-valued users and group B has 5000 high-value users, group A would have a 140% chance of producing better revenues moving forward.

      • MS

        Martijn Scheijbeler

        6 months ago #

        Your question makes sense, but I have the same problem. It seems that your metric could be flawed if you have some outliers that are generating way more revenue you still don't know for sure if the average is higher or not than the other networks that you're using. If it's systematically higher than for the other ad networks you might have something to work with. In that case, you could use the median/average and go from there.

        Is there a chance that you can use a metric of purchases/user, or quantity/user so you can see if they add more products? If you're just looking at the revenue generated I fear your analysis might be too biased towards high-paying customers. Over the long term, you could use the number of transactions without a cohort then to determine if they're significantly higher or not.

  • SS

    Sofia s

    6 months ago #

    Awesome post

  • YV

    Yannick Veys

    10 days ago #

    You would have to calculate what the per user revenue is first.

    So let's say you have 2 groups.

    Group 1: 1000 people total revenue 15,000
    Group 2: 700 people total revenue 14,000

    Now calculate the revenue per user

    Group 1: 15,000 / 1000 = 15
    Group 2: 14,000 / 700 = 20

    You can use these two numbers to see if you have a significant change in your revenue numbers with statistical significance calculator like this on: https://yannickveys.com/free-tools/statistical-significance-calculator/

    With these numbers you're still below the 95% significance level. So you either need a bigger difference in revenue (round them to a whole number) or you need a bigger audience.

    Good luck with your test!

Join over 70,000 growth pros from companies like Uber, Pinterest & Twitter

Get Weekly Top Posts
High five! You’re in.
SHARE
6
6