Analyze Facebook Ads Report with Pandas

8 min readJul 18, 2021

TL;DR

Analyzing performances of Facebook ad campaigns. Giving recommendations on how to get the best performance out of the campaigns. The targeted feature is the conversion of the listed ads. A similar data set can be found here: Kaggle source. This analysis also exists as a runnable jupyter notebook HERE. Also, check out the original Jupyter Notebook.

Intro

This article is part of the core script of a home assignment from a Dutch company. A data set of 1,143 observations was provided to interpret, analyze and convert into action points. The assignment states to make sure the insights I uncover are accompanied by action points and provided by reasoning why?

The Data

The given data set is an ad performance report. I assume it was created by the Facebook Ads Manager. Accordingly, I will focus on marketing metrics like conversion and ad costs.

Goals

The company is looking for opportunities to exponentially increase their growth rate in markets outside of the Netherlands. This suggests to find the top converting ads and implement its learnings onto other, low performing ads given a measurable low cost input.

Identify high converting ads
Find most efficient ads
Conclude must-have plan for future ads

Data 💾

Data Extraction

# load data frame and parse date columns
date_cols = ['reporting_start', 'reporting_end']
df = pd.read_csv('raw_data/data_for_data_analytics.csv',
                parse_dates=date_cols,
                )
# glimpse on the df
df.head()

Some manual data handling is required, due to missing values in the lower ~380 rows.

# cells are shifted left at index 761, leaving wrong values in the remaining columns
# detect na rows and shift them 2 cells to the right
df.loc[df.total_conversion.isna(), 'campaign_id':'approved_conversion'] \
        = df.loc[df.total_conversion.isna(), 'campaign_id':'approved_conversion'].shift(2, 
                                                                                        axis=1)# fillna
df[['campaign_id', 'fb_campaign_id']] = df[['campaign_id', 'fb_campaign_id']].fillna(-1)
# fixing the dtypes
df = df.astype(dtype={'campaign_id':int,
                'fb_campaign_id':int,
                'interest1':int,
                'interest2':int,
                'interest3':int
              })# show fixed area (to check)
df.loc[758:763]

Prelim analysis

print(f'There are {df.ad_id.nunique()} unique values in "ad_id" (primary key).')
print(f'The overall reporting time span is {(df.reporting_end.max() - df.reporting_start.min()).days} days.')
print(f'In total {round(df.spent.sum(),2)} EUR were spent on all campaigns in the recored time.')There are 1143 unique values in "ad_id" (primary key).
The overall reporting time span is 13 days.
In total 58705.23 EUR were spent on all campaigns in the recored time.df.info()<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1143 entries, 0 to 1142
Data columns (total 15 columns):
 #   Column               Non-Null Count  Dtype         
---  ------               --------------  -----         
 0   ad_id                1143 non-null   int64         
 1   reporting_start      1143 non-null   datetime64[ns]
 2   reporting_end        1143 non-null   datetime64[ns]
 3   campaign_id          1143 non-null   int64         
 4   fb_campaign_id       1143 non-null   int64         
 5   age                  1143 non-null   object        
 6   gender               1143 non-null   object        
 7   interest1            1143 non-null   int64         
 8   interest2            1143 non-null   int64         
 9   interest3            1143 non-null   int64         
 10  impressions          1143 non-null   float64       
 11  clicks               1143 non-null   int64         
 12  spent                1143 non-null   float64       
 13  total_conversion     1143 non-null   float64       
 14  approved_conversion  1143 non-null   float64       
dtypes: datetime64[ns](2), float64(4), int64(7), object(2)
memory usage: 134.1+ KB

Feature Engineering 👷

Feature categories

I assume that the report shows two important parts in its features — preset_features and performance_features.

They will be assigned accordingly:

preset_features: ['ad_id', 'reporting_end', 'age', 'gender', 'interest1', 'interest2', 'interest3']
performance_features: ['impressions', 'clicks', 'spent', 'approved_conversion']

That means that for each scheduled ad there has been a designated audience targeting and Facebook assigned reported performance metrics to the advertisement.

Data cleaning

Some minor data cleaning will be done due to incomplete date and redundant features.

I will solely focus on analyzing the individual ads, given the fact that there ~30% of the campaign_id data missing. The data set stays at its current granularity, however, the columns campaign_id and fb_campaign_id can be removed. Once the missing data is acquired it can still be added to the set with no loss.
The columns reporting_start and reporting_end are identical. One will be removed.
The very limited context of the data set doesn’t allow me to define the difference between the columns total_conversion and approved_conversion. Therefore, I will only use approved_conversion as a valid measure for success (e.g. 'sign up for bank account').
Some rows show conversions without a click. This could derive from overlapping recording time windows. Since their origin is not fully clear I will remove those rows (71 rows).

df_ = df.copy().drop(columns=['campaign_id', 
                      'fb_campaign_id', 
                      'reporting_start',
                      'total_conversion'])
# conversions without clicks are excluded
mask = (df_.clicks == 0) & (df_.approved_conversion > 0)
df_ = df_.loc[~mask]

Feature calculations

Some additional calculations will be necessary for further analysis:

CPC (Cost per Click)
CPCon (Cost per approved_conversion)
CPM (Cost per 1,000 impressions)
CTR (Click through rate)
conversion_rate (approved_conversion / impressions)
weekd (day of the week, starting 0 = Monday)

df_['CPC'] = df_.spent / df_.clicks
df_['CPCon'] = df_.apply(lambda x: (x.spent / x.approved_conversion) if x.approved_conversion != 0 else np.nan, 
                         axis=1) # avoid inf values with 0 as conversion
df_['CPM'] = round((df_.spent / df_.impressions) * 1000, 2)
df_['CTR'] = df_.clicks / df_.impressions
df_['conversion_rate'] = df_.approved_conversion / df_.impressions
df_['weekd'] = df_.reporting_end.apply(lambda x: x.dayofweek)

Remove feature bias / create interest-sets

Columns interest1-interest3 hold information about the targeted interest groups. The IDs are defined by Facebook. I wasn't able to decode them, hence, they will stay encoded. However, these columns show duplicated entries. Some rows have a similar trailing value from interest2 to interest3. I will remove these duplicates.
Additionally, I assume that interests are not prioritized (from 1-3), but rather given as an equal keyword pool. Therefore, they should be concatenated as unique sets in order to see which interest combinations perform best.

# create interest_set with unique interests
def interest_set(x):
    """
    Returns concatenated interest1-interest3 to have a unique interest set.
    Set is sorted to avoid permutation duplicates. 
    """
    s = sorted(set([x.interest1, x.interest2, x.interest3]))
    string = '_'.join([str(x) for x in s])
    return string

df_['interest_set'] = df_.apply(interest_set, axis=1)

# keep single interests but remove trailing duplicates
df_clean = df_.copy()
df_clean.loc[df_clean.interest2 == df_clean.interest3, 'interest3'] = -1df_clean.head()

Data Analysis 📈

Socio analysis 🌈

plt.figure(figsize=(15,8))
# display CTR and conversion rate in violinplots
for i, metric in enumerate(['CTR', 'conversion_rate']):
    plt.subplot(1,2,i+1)
    sns.violinplot(data=df_clean[df_clean[metric] != 0], 
                   x='age', 
                  y=metric,
                  hue='gender',
                  orient='vertical',
                  split=True)
    # give hints on significance / outlier behavior / balance
    sns.stripplot(x='age', 
                  y=metric, 
                  data=df_clean[df_clean[metric] != 0],
                  hue='gender',
                  jitter=True,
                  alpha=.3
                 )
    plt.title(f'{metric.capitalize()} Analysis by Socio Factors',
             size=15)
    plt.grid(axis='y')
    
plt.show()

Disclaimer

Data with 0 as target value have been removed (no-performer)

Note

Female CTR (mode, mean and median) is constantly higher throughout all age types.
Older generations are more click affine.
Younger generations actually come to closure.
High variances are visible (let’s have a look at the outliers)

Day-by-Day performance 🗓

# display weekday performance
plt.figure(figsize=(10,7))
sns.barplot(data=df_clean,
            y='conversion_rate',
            x='weekd',
           )
plt.title('Conversion per weekday', 
          size=15)
plt.grid(axis='y')
plt.show()

Note

Tuesdays and Fridays (1 and 4) are performing best in average.
Wednesdays and Saturdays (high variance) show the least user engagement.

Interest performance 🤹

# group df by interest_sets
df_interests = df_clean.groupby('interest_set').mean().sort_values(by='conversion_rate', 
                                                                   ascending=False)

# display top20 interest sets
plt.figure(figsize=(15,6))
sns.barplot(x=df_interests.head(20).index,
           y=df_interests.head(20).conversion_rate,
           )
plt.xticks(rotation=90)
plt.grid(axis='y')
plt.title(f'Top 20 Interest Sets by Conversion Rate', size=15)

plt.show()

Note

Interest sets 30-33-34 and 21-24-26 show the best conversion performances.

# create df with most expensive ads without conversion
interest_flops = df_interests[(df_interests.spent > 0) \
                              & (df_interests.approved_conversion == 0.0)].sort_values(by='spent',
                                                                                              ascending=False)

# plot top30
plt.figure(figsize=(15,6))
sns.barplot(data=interest_flops.head(30),
               x=interest_flops.head(30).index,
               y='spent',
               )
plt.xticks(rotation=90)
plt.ylabel('Avg. Ad Cost (EUR)')
plt.title('Top30 Most Expensive Interest Sets with no Conversion', size=15)
plt.show()

Note

The set 27-31-33 was the most expensive and thus least efficient interest set in the data set.

# check top100 conversionists and their interests
df_conversion = df_clean.sort_values(by='conversion_rate', ascending=False).head(100)
df_conversion_low = df_clean.sort_values(by='conversion_rate', ascending=True).head(100)

# concatenate all available values
plt.figure(figsize=(15,10))
for i, (k, df) in enumerate({'High performing': df_conversion, 
              'Low performing': df_conversion_low}.items()):
    new_lst = list(df.interest1.values) + list(df.interest2.values) + list(df.interest3.values)
    count = dict(Counter(new_lst))

    # create df
    interest_df = pd.DataFrame(list(count.items()))
    interest_df.columns =["interest", "count"]
    interest_df = interest_df.sort_values(by='count', ascending=False).reset_index(drop=True)
    interest_df = interest_df[interest_df['interest'] != -1]

    # plot
    plt.subplot(2,1,i+1)
    sns.barplot(data=interest_df,
               x='interest',
               y='count',
               order=interest_df['interest'],
               )
    plt.title(f'{k} Interest Groups by Conversion', size=15)
    plt.grid(axis='y')
    plt.tight_layout()
plt.show()

Note

Interest group 16 appears 20 times in the top 100 converting ads
Interest group 32 appears the most in the low performing ads

Impressions performance 👀

Note

Higher bids lead to higher impressions, thus, higher (paid) clicks. Correlation is there.
There are some outliers to be inspected, like ad 1122265.

Set Cost goals 🎯

Let’s have a look at some excerpts from the data set to set a goal for future ads and approve to the above findings.

The cheapest 👼 impressions which still led to conversions:

# filter by Cost per Million without the non conversion / click data
df_clean[df_clean.CPM != 0].sort_values(by='CPM').head()

Note

All of the top 5 cheapest cost per impressions are targeted to a male audience with an over representation of interest groups 16 and 17.

The worst (most expensive) 👺 cost per conversion ads.

df_clean[df_clean.CPCon != 0].sort_values(by='CPCon', ascending=False).head(10)

Note
To avoid highly clicked and costly ads with no conversion:

Avoid weekends.
Avoid interest group 32 overall.
Avoid age group 45-49.

The CPCon range

# show cheapest and most expensive conversions
df_clean.CPCon.describe()count    513.000000
mean      46.165392
std       55.627454
min        0.180000
25%        7.540000
50%       29.910000
75%       60.730000
max      352.449999
Name: CPCon, dtype: float64

Findings and Recommendations 💎

The Cost per Conversion ranges between 0.18–352.45 EUR. Over 75% of them lie under 60 EUR. Highly costly outliers show that some campaigns were lacking maintenance.
Weekdays play a significant role for ad performance.
Some targeted interest groups lead to high costs (clicks) but low conversion.
There is a significant difference between age and gender groups.

Best practices for well converting ads:

Use Tuesdays and Fridays as preferred scheduling days. Details
Interest sets 30-33-34 and 21-24-26 show the best conversion performances. Details
Interest group 16 seems like a good general catch. Details
The female audience could be targeted stronger due to better conversions, provided that company fairness policy allows that. Details

To avoid highly clicked and costly ads with no conversion Details:

Avoid weekends.
Avoid interest group 32 overall.
Avoid age group 45-49 (window shoppers).
Create an alert for cost exploding ads or set budget limits.

Creds: Moritz Geiger LinkedIn | GitHub

— спасибо за Ваше внимание 🙋 —

Analyze Facebook Ads Report with Pandas

Intro

The Data

Goals

Data 💾

Data Extraction

Prelim analysis

Feature Engineering 👷

Feature categories

Data cleaning

Feature calculations

Remove feature bias / create interest-sets

Data Analysis 📈

Socio analysis 🌈

Day-by-Day performance 🗓

Interest performance 🤹

Impressions performance 👀

Set Cost goals 🎯

Findings and Recommendations 💎

Written by Moritz Geiger

No responses yet