Simple revenue funnel 🛒¶

We look at a toy website funnel in this example. Imagine a fictitious website that sells stuff. Users go to the website, are presented with items, can add them to their cart, and then can buy them.

import pandas as pd
import locale

locale.setlocale(locale.LC_MONETARY, 'en_US.UTF-8')
def fmt_currency(x):
    return locale.currency(x, grouping=True)

traffic = pd.DataFrame({
    'date': ['2018-01-01', '2018-01-01', '2018-01-01', '2019-01-01', '2019-01-01', '2019-01-01', '2018-02-01', '2018-02-01', '2018-02-01', '2019-02-01', '2019-02-01', '2019-02-01'],
    'group': ['A', 'B', 'C', 'A', 'B', 'C', 'A', 'B', 'C', 'A', 'B', 'C'],
    'impressions': [1000, 2000, 2500, 1000, 2150, 2000, 50, 2000, 2500, 2500, 2150, 2000],
    'clicks': [150, 150, 250, 120, 200, 400, 20, 300, 250, 1000, 323, 320],
    'conversions': [120, 150, 125, 160, 145, 166, 10, 150, 125, 500, 145, 166],
    'revenue': ['$8,600', '$9,400', '$10,750', '$9,055', '$8,739', '$10,147', '$500', '$11,400', '$8,750', '$50,000', '$10,739', '$12,147'],
})
traffic['date'] = pd.to_datetime(traffic['date'])
traffic['revenue'] = traffic['revenue'].str.replace('$', '', regex=False).str.replace(',', '', regex=False).astype(float)
traffic.style.format({'revenue': fmt_currency, 'date': lambda x: x.strftime('%Y-%m-%d')}, na_rep='N/A')

	date	group	impressions	clicks	conversions	revenue
0	2018-01-01	A	1000	150	120	$8,600.00
1	2018-01-01	B	2000	150	150	$9,400.00
2	2018-01-01	C	2500	250	125	$10,750.00
3	2019-01-01	A	1000	120	160	$9,055.00
4	2019-01-01	B	2150	200	145	$8,739.00
5	2019-01-01	C	2000	400	166	$10,147.00
6	2018-02-01	A	50	20	10	$500.00
7	2018-02-01	B	2000	300	150	$11,400.00
8	2018-02-01	C	2500	250	125	$8,750.00
9	2019-02-01	A	2500	1000	500	$50,000.00
10	2019-02-01	B	2150	323	145	$10,739.00
11	2019-02-01	C	2000	320	166	$12,147.00

The users are bucketed into 3 groups: A, B, C. We've also bucketed impressions/clicks/conversions/revenue figures by month of the year.

We're interested in understanding how the metrics evolve over time. The basic method is to calculate each metric separately. To keep things simple, we can do this for each year.

pd.DataFrame({
    'impressions': (
        traffic
        .assign(year=traffic.date.dt.year)
        .groupby('year')
        .impressions.sum()
    ),
    'click_rate': (
        traffic
        .assign(year=traffic.date.dt.year)
        .groupby('year')
        .apply(lambda x: x.clicks.sum() / x.impressions.sum(), include_groups=False)
    ),
    'conversion_rate': (
        traffic
        .assign(year=traffic.date.dt.year)
        .groupby('year')
        .apply(lambda x: x.conversions.sum() / x.clicks.sum(), include_groups=False)
    ),
    'average_spend': (
        traffic
        .assign(year=traffic.date.dt.year)
        .groupby('year')
        .apply(lambda x: x.revenue.sum() / x.conversions.sum(), include_groups=False)
    ),
    'revenue': (
        traffic
        .assign(year=traffic.date.dt.year)
        .groupby('year')
        .revenue.sum()
    )
}).style.format({'average_spend': fmt_currency, 'revenue': fmt_currency}, na_rep='')

	impressions	click_rate	conversion_rate	average_spend	revenue
year
2018	10050	0.111443	0.607143	$72.65	$49,400.00
2019	11800	0.200254	0.542531	$78.65	$100,827.00

In and of itself, this is already quite interesting. However, what we really want to know is how the change of each metric contributes to the change in revenue. This is where icanexplain comes in.

import icanexplain as ice

explainer = ice.FunnelExplainer(
    funnel=['impressions', 'clicks', 'conversions', 'revenue'],
    period='year',
    group=['month', 'group']
)
traffic = traffic.assign(
    month=traffic.date.dt.month,
    year=traffic.date.dt.year
)
explanation = explainer(traffic)
explanation.style.format(fmt_currency).set_properties(**{'text-align': 'right'})

			impressions_contribution	clicks_over_impressions_contribution	conversions_over_clicks_contribution	revenue_over_conversions_contribution
year	month	group
2019	1	A	$0.00	-$1,720.00	$4,586.67	-$2,411.67
		B	$705.00	$2,428.33	-$3,446.67	-$347.67
		C	-$2,150.00	$8,600.00	-$2,924.00	-$4,129.00
	2	A	$24,500.00	$0.00	$0.00	$25,000.00
		B	$855.00	$19.00	-$1,254.00	-$281.00
		C	-$1,750.00	$4,200.00	$420.00	$527.00

This is powerful, because it allows us to understand the drivers of revenue growth. For example, between January 2018 and January 2019, revenue went up by $8,600 due an increase in clicks for group C. This is more insightful than just saying that their click rate went up.

One thing to keep in mind is that contributions sum up to the overall difference between two periods. This means that it's easy to unit test that the contributions are correct:

(
    explanation
    .groupby('year').sum().sum(axis=1)
    .to_frame('sum')
    .style.format(fmt_currency)
)

	sum
year
2019	$51,427.00

Of course, it would be more interesting to apply this methodology to some real data. One example is the Google Analytics dataset sample which is publicly available in BigQuery.