Skip to content

Simple revenue funnel 🛒

We look at a toy website funnel in this example. Imagine a fictitious website that sells stuff. Users go to the website, are presented with items, can add them to their cart, and then can buy them.

import pandas as pd
import locale

locale.setlocale(locale.LC_MONETARY, 'en_US.UTF-8')
def fmt_currency(x):
    return locale.currency(x, grouping=True)

traffic = pd.DataFrame({
    'date': ['2018-01-01', '2018-01-01', '2018-01-01', '2019-01-01', '2019-01-01', '2019-01-01', '2018-02-01', '2018-02-01', '2018-02-01', '2019-02-01', '2019-02-01', '2019-02-01'],
    'group': ['A', 'B', 'C', 'A', 'B', 'C', 'A', 'B', 'C', 'A', 'B', 'C'],
    'impressions': [1000, 2000, 2500, 1000, 2150, 2000, 50, 2000, 2500, 2500, 2150, 2000],
    'clicks': [150, 150, 250, 120, 200, 400, 20, 300, 250, 1000, 323, 320],
    'conversions': [120, 150, 125, 160, 145, 166, 10, 150, 125, 500, 145, 166],
    'revenue': ['$8,600', '$9,400', '$10,750', '$9,055', '$8,739', '$10,147', '$500', '$11,400', '$8,750', '$50,000', '$10,739', '$12,147'],
})
traffic['date'] = pd.to_datetime(traffic['date'])
traffic['revenue'] = traffic['revenue'].str.replace('$', '', regex=False).str.replace(',', '', regex=False).astype(float)
traffic.style.format({'revenue': fmt_currency, 'date': lambda x: x.strftime('%Y-%m-%d')}, na_rep='N/A')
  date group impressions clicks conversions revenue
0 2018-01-01 A 1000 150 120 $8,600.00
1 2018-01-01 B 2000 150 150 $9,400.00
2 2018-01-01 C 2500 250 125 $10,750.00
3 2019-01-01 A 1000 120 160 $9,055.00
4 2019-01-01 B 2150 200 145 $8,739.00
5 2019-01-01 C 2000 400 166 $10,147.00
6 2018-02-01 A 50 20 10 $500.00
7 2018-02-01 B 2000 300 150 $11,400.00
8 2018-02-01 C 2500 250 125 $8,750.00
9 2019-02-01 A 2500 1000 500 $50,000.00
10 2019-02-01 B 2150 323 145 $10,739.00
11 2019-02-01 C 2000 320 166 $12,147.00

The users are bucketed into 3 groups: A, B, C. We've also bucketed impressions/clicks/conversions/revenue figures by month of the year.

We're interested in understanding how the metrics evolve over time. The basic method is to calculate each metric separately. To keep things simple, we can do this for each year.

pd.DataFrame({
    'impressions': (
        traffic
        .assign(year=traffic.date.dt.year)
        .groupby('year')
        .impressions.sum()
    ),
    'click_rate': (
        traffic
        .assign(year=traffic.date.dt.year)
        .groupby('year')
        .apply(lambda x: x.clicks.sum() / x.impressions.sum(), include_groups=False)
    ),
    'conversion_rate': (
        traffic
        .assign(year=traffic.date.dt.year)
        .groupby('year')
        .apply(lambda x: x.conversions.sum() / x.clicks.sum(), include_groups=False)
    ),
    'average_spend': (
        traffic
        .assign(year=traffic.date.dt.year)
        .groupby('year')
        .apply(lambda x: x.revenue.sum() / x.conversions.sum(), include_groups=False)
    ),
    'revenue': (
        traffic
        .assign(year=traffic.date.dt.year)
        .groupby('year')
        .revenue.sum()
    )
}).style.format({'average_spend': fmt_currency, 'revenue': fmt_currency}, na_rep='')
  impressions click_rate conversion_rate average_spend revenue
year          
2018 10050 0.111443 0.607143 $72.65 $49,400.00
2019 11800 0.200254 0.542531 $78.65 $100,827.00

In and of itself, this is already quite interesting. However, what we really want to know is how the change of each metric contributes to the change in revenue. This is where icanexplain comes in.

import icanexplain as ice

explainer = ice.FunnelExplainer(
    funnel=['impressions', 'clicks', 'conversions', 'revenue'],
    period='year',
    group=['month', 'group']
)
traffic = traffic.assign(
    month=traffic.date.dt.month,
    year=traffic.date.dt.year
)
explanation = explainer(traffic)
explanation.style.format(fmt_currency).set_properties(**{'text-align': 'right'})
      impressions_contribution clicks_over_impressions_contribution conversions_over_clicks_contribution revenue_over_conversions_contribution
year month group        
2019 1 A $0.00 -$1,720.00 $4,586.67 -$2,411.67
B $705.00 $2,428.33 -$3,446.67 -$347.67
C -$2,150.00 $8,600.00 -$2,924.00 -$4,129.00
2 A $24,500.00 $0.00 $0.00 $25,000.00
B $855.00 $19.00 -$1,254.00 -$281.00
C -$1,750.00 $4,200.00 $420.00 $527.00

This is powerful, because it allows us to understand the drivers of revenue growth. For example, between January 2018 and January 2019, revenue went up by $8,600 due an increase in clicks for group C. This is more insightful than just saying that their click rate went up.

One thing to keep in mind is that contributions sum up to the overall difference between two periods. This means that it's easy to unit test that the contributions are correct:

(
    explanation
    .groupby('year').sum().sum(axis=1)
    .to_frame('sum')
    .style.format(fmt_currency)
)
  sum
year  
2019 $51,427.00

Of course, it would be more interesting to apply this methodology to some real data. One example is the Google Analytics dataset sample which is publicly available in BigQuery.