Simple revenue funnel 🛒¶
We look at a toy website funnel in this example. Imagine a fictitious website that sells stuff. Users go to the website, are presented with items, can add them to their cart, and then can buy them.
import pandas as pd
import locale
locale.setlocale(locale.LC_MONETARY, 'en_US.UTF-8')
def fmt_currency(x):
return locale.currency(x, grouping=True)
traffic = pd.DataFrame({
'date': ['2018-01-01', '2018-01-01', '2018-01-01', '2019-01-01', '2019-01-01', '2019-01-01', '2018-02-01', '2018-02-01', '2018-02-01', '2019-02-01', '2019-02-01', '2019-02-01'],
'group': ['A', 'B', 'C', 'A', 'B', 'C', 'A', 'B', 'C', 'A', 'B', 'C'],
'impressions': [1000, 2000, 2500, 1000, 2150, 2000, 50, 2000, 2500, 2500, 2150, 2000],
'clicks': [150, 150, 250, 120, 200, 400, 20, 300, 250, 1000, 323, 320],
'conversions': [120, 150, 125, 160, 145, 166, 10, 150, 125, 500, 145, 166],
'revenue': ['$8,600', '$9,400', '$10,750', '$9,055', '$8,739', '$10,147', '$500', '$11,400', '$8,750', '$50,000', '$10,739', '$12,147'],
})
traffic['date'] = pd.to_datetime(traffic['date'])
traffic['revenue'] = traffic['revenue'].str.replace('$', '', regex=False).str.replace(',', '', regex=False).astype(float)
traffic.style.format({'revenue': fmt_currency, 'date': lambda x: x.strftime('%Y-%m-%d')}, na_rep='N/A')
date | group | impressions | clicks | conversions | revenue | |
---|---|---|---|---|---|---|
0 | 2018-01-01 | A | 1000 | 150 | 120 | $8,600.00 |
1 | 2018-01-01 | B | 2000 | 150 | 150 | $9,400.00 |
2 | 2018-01-01 | C | 2500 | 250 | 125 | $10,750.00 |
3 | 2019-01-01 | A | 1000 | 120 | 160 | $9,055.00 |
4 | 2019-01-01 | B | 2150 | 200 | 145 | $8,739.00 |
5 | 2019-01-01 | C | 2000 | 400 | 166 | $10,147.00 |
6 | 2018-02-01 | A | 50 | 20 | 10 | $500.00 |
7 | 2018-02-01 | B | 2000 | 300 | 150 | $11,400.00 |
8 | 2018-02-01 | C | 2500 | 250 | 125 | $8,750.00 |
9 | 2019-02-01 | A | 2500 | 1000 | 500 | $50,000.00 |
10 | 2019-02-01 | B | 2150 | 323 | 145 | $10,739.00 |
11 | 2019-02-01 | C | 2000 | 320 | 166 | $12,147.00 |
The users are bucketed into 3 groups: A, B, C. We've also bucketed impressions/clicks/conversions/revenue figures by month of the year.
We're interested in understanding how the metrics evolve over time. The basic method is to calculate each metric separately. To keep things simple, we can do this for each year.
pd.DataFrame({
'impressions': (
traffic
.assign(year=traffic.date.dt.year)
.groupby('year')
.impressions.sum()
),
'click_rate': (
traffic
.assign(year=traffic.date.dt.year)
.groupby('year')
.apply(lambda x: x.clicks.sum() / x.impressions.sum(), include_groups=False)
),
'conversion_rate': (
traffic
.assign(year=traffic.date.dt.year)
.groupby('year')
.apply(lambda x: x.conversions.sum() / x.clicks.sum(), include_groups=False)
),
'average_spend': (
traffic
.assign(year=traffic.date.dt.year)
.groupby('year')
.apply(lambda x: x.revenue.sum() / x.conversions.sum(), include_groups=False)
),
'revenue': (
traffic
.assign(year=traffic.date.dt.year)
.groupby('year')
.revenue.sum()
)
}).style.format({'average_spend': fmt_currency, 'revenue': fmt_currency}, na_rep='')
impressions | click_rate | conversion_rate | average_spend | revenue | |
---|---|---|---|---|---|
year | |||||
2018 | 10050 | 0.111443 | 0.607143 | $72.65 | $49,400.00 |
2019 | 11800 | 0.200254 | 0.542531 | $78.65 | $100,827.00 |
In and of itself, this is already quite interesting. However, what we really want to know is how the change of each metric contributes to the change in revenue. This is where icanexplain comes in.
import icanexplain as ice
explainer = ice.FunnelExplainer(
funnel=['impressions', 'clicks', 'conversions', 'revenue'],
period='year',
group=['month', 'group']
)
traffic = traffic.assign(
month=traffic.date.dt.month,
year=traffic.date.dt.year
)
explanation = explainer(traffic)
explanation.style.format(fmt_currency).set_properties(**{'text-align': 'right'})
impressions_contribution | clicks_over_impressions_contribution | conversions_over_clicks_contribution | revenue_over_conversions_contribution | |||
---|---|---|---|---|---|---|
year | month | group | ||||
2019 | 1 | A | $0.00 | -$1,720.00 | $4,586.67 | -$2,411.67 |
B | $705.00 | $2,428.33 | -$3,446.67 | -$347.67 | ||
C | -$2,150.00 | $8,600.00 | -$2,924.00 | -$4,129.00 | ||
2 | A | $24,500.00 | $0.00 | $0.00 | $25,000.00 | |
B | $855.00 | $19.00 | -$1,254.00 | -$281.00 | ||
C | -$1,750.00 | $4,200.00 | $420.00 | $527.00 |
This is powerful, because it allows us to understand the drivers of revenue growth. For example, between January 2018 and January 2019, revenue went up by $8,600 due an increase in clicks for group C. This is more insightful than just saying that their click rate went up.
One thing to keep in mind is that contributions sum up to the overall difference between two periods. This means that it's easy to unit test that the contributions are correct:
sum | |
---|---|
year | |
2019 | $51,427.00 |
Of course, it would be more interesting to apply this methodology to some real data. One example is the Google Analytics dataset sample which is publicly available in BigQuery.