- May 13, 2020
- 7 min read

How to Automate Revenue Optimization to Boost Organic Traffic ROI (Part 1)

Building organic traffic is nice, but ROI is what really moves the needle. Same for SEO as for any other marketing channel. This guide will teach you how to automate SEO revenue optimization to boost your organic traffic ROI.

How I automated this process?

I coded a cool Python script, that fetches data from different sources, blends them all together and output to a simple Excel file with few tabs, based on the various segmentation and data points.

What data I use?

Tableau – for Visits, Revenue, EPV (Earning per Visit), CR (Conversion Rate), CTR (Click Through Rate) and Unique Clicks
Log files – for Google bot hits (for 200 ok pages)
Google Search Console – for average position per page

What I’m trying to achieve?

My goal is to be able to easily identify revenue opportunities in the data, and provide prioritized list of action items, focused on how we can leverage existing content to generate more revenues from the organic channel.

Additionally, this tool enables my coworkers, who are not familiar with coding or automation, to get the data in a simple format, which can then be used for an efficient data analysis.

What was my main challenge?

One of the main challenges I faced was to analyze Tableau data. When export to a CSV file, Tableau data looks like this:

Original Tableau Data

For each page, you’ll get the main KPIs (Measure Names), row by row.

Now, the screenshot is filtered for only one page for demonstration purposes, but clear the filter, and multiply 7-9 metrics rows by the number of pages on your site – and you’ll get here a huge, long file, that is very hard to analyze and deal with.

Therefore, some of the code in this tool handles this use case and switching the Tableau data to a more readable format, with Page on the most left column and all KPI’s broken to different columns.

This is far more convenient, one might argue, to compare pages metrics, sort, and filter, when getting the data in this form. And it enables us to blend other page’s metrics here, on a page level, and get it all together, in one file.

* The figures in the table above are not real and used only for demonstration

Now, this probably can be sorted in other, perhaps even less technical ways, but for me it was just an excuse to why I’d like to automate it :)

Once I cleaned and blended the data from all sources, I got this output:

The figures in the table above are not real and used only for demonstration.

* Vertical: consider this as ‘Category’ for simplification (vertical is a private case in MY data) * Page Types include: Article page, Review page, Blog page, PPC page, Other

Weight & Weighted Avg EPV – are not in the scope of this post, but in few words, it allows you to accurate your comparison by transforming absolute data to relative data.

For example, a page with a very high EPV for only a portion of the traffic, does not mean a lot when you’re trying to compare pages based on their EPV.

Additionally, I created few graphs from this data, just to be able to better understand the big picture, and clearly share what I’m trying to achieve here with other stake holders and higher management.

Here’s an example of two scatter graphs that shows the correlation between revenue and position (on the left) and the correlation between revenue and visits. You can play with your data and build other interesting visualization to demonstrate ROI.

* The figures in the graph above are not real and used only for demonstration

What action items I took?

With this final version I got above, I started analyzing based on different data segmentation, to try and get first few action items, in order to increase organic traffic revenues.

I might need to dedicate a separate post for the optimization process, because I have a lot to share on that aspect, but just to give you an idea around few of them. I started with:

Optimize pages with high visits & low revenue
Optimize pages with high Gbot hits & low revenue
Optimize pages with high visits & (relatively) low Gbot hits
Quick win – focus on pages with traffic and revenue potential, ranked in avg. position 11-20 (second page) – to first page. Here you’ll have to dive into keyword analysis to figure out which one are relevant.
Improve / add CTA elements on pages with low revenue, test and optimize more.
Improve internal linking to important / money pages.
Consider bringing up ‘rising’ traffic pages to main navigation / above the fold / sidebar widget.
Run keyword and competitor research for those pages to improve existing content and find ideas for new content you’re not targeting yet.

How I prioritized the work?

Once I had a list of all pages I’d like to optimize, I used a prioritization technique I picked up from a good friend just recently. Now, there are many other ways to prioritize, but I chose this one because of its simplicity.

In this technique we transform absolute data to relative data, by using Python NumPy random choice:

h_data[‘MIN’] = [np.random.choice(h_data.columns[x == x.min()], 1)[0] for x in h_data.values]

Absolute Data

Relative Data

The lower the Total Points get – the higher is the score, and that’s where you’ll start your optimization process.

And, for the fun part – the Python code

In terms of Python coding, this tool focus on the revenue data (Tableau in this case), since I’m already using two other SEO automation tools that I previously developed, to bring data from log files analyzer and Google Search Console data fetcher to this tool.

Nevertheless, code is important, and since I’m doing some manipulation on the log files and GSC data as well (after bringing it from the other tools), it makes sense to deep into it.

But first… few things to keep in mind

The tool is semi-automated, as for most parts I’m not fetching data from my cloud or any API, but from csv files;
Tableau – a csv file that I’m downloading to my local Python project.
Log files and GSC – csv files generated automatically when I run those two tools mentioned above. (GSC fetcher tool does pull data using API).
‘Charticle’ / ‘Superlist’ data points appearing in the code are a private case, so you might want to ignore them. They have have no effect on your ability to run this code properly.
In general, the code needs to be refactored, and is far from being prefect in terms of structure, syntax and efficiency, but it works and that’s what matters!

Code Structure

The code is broken to 4 Python files:

gcs.py – fetch Google Search Console data
logs.py – fetch log files data
tableau.py – fetch Tableau data
main.py –collect data from all the above, clean it, merge and output to Excel file.

Google Search Console Data – gsc.py

Import GSC data from a csv file
group by page
get latest position (date-wise)

import pandas as pd

def get_gsc_data():
    path_gsc = 'data\\source-gsc.csv'
    gsc_column_names = ['Clicks', 'CTR', 'Impressions', 'Position', 'Page', 'Date']
    df_gsc = pd.read_csv(path_gsc, names=gsc_column_names, index_col=0)
    df_gsc['Page'] = df_gsc['Page'].str.replace('https://www.aaa.com/', '/')
    df_gsc['Date'] = pd.to_datetime(df_gsc['Date'], errors='coerce')
    return df_gsc


def group_gsc_by_page(df_gsc):
    df_group_by_type = df_gsc.sort_values('Date').groupby(['Page']).agg(Position=('Position', 'last'))
    return df_group_by_type

Log Files Data – logs.py

Import log files data from a csv file
Filter to 200 ok status code pages
Build another view for 404 pages (for a later use)

import pandas as pd

def get_log_files_data():
    path_logs = 'data\\source-log-files.csv'
    logs_column_names = ['Date', 'Page', 'Status_Code', 'User_Agent', 'Ip']
    df_logs = pd.read_csv(path_logs, names=logs_column_names, index_col=0)
    return df_logs


# filter 200ok only
def get_200_ok_status(df_logs):
    df_logs_200_status = df_logs.loc[(df_logs.Status_Code == '200')]
    return df_logs_200_status


def get_404_log_files(df_logs):
    df_logs_404_status = df_logs.loc[(df_logs.Status_Code == '404')]
    return df_logs_404_status


# group + count logs by date and path
def group_log_files_by_page(df_logs_200_status):
    df_logs_group_by_page = df_logs_200_status.groupby(['Page']).size().rename('Gbot Hits')  # count Gbot hits by page
    return df_logs_group_by_page

Tableau Data – tableau.py

import pandas as pd
from pandas import DataFrame


def get_tableau_page_data():
    path_tableau_by_page = 'data\\source-tableau.csv'
    tableau_page_column_names = ['landing_page', 'Measure Names', 'Measure Values', 'VISITs']
    df_tableau_by_page = pd.read_csv(path_tableau_by_page, names=tableau_page_column_names, index_col=0, header=0)

    df_tableau_by_page = df_tableau_by_page[~df_tableau_by_page.index.str.endswith('/reviews', na=False)]
    df_tableau_by_page = df_tableau_by_page[~df_tableau_by_page.index.str.endswith('/article', na=False)]
    df_tableau_by_page = df_tableau_by_page[~df_tableau_by_page.index.str.contains('#', na=False)]
    df_tableau_by_page = df_tableau_by_page[~df_tableau_by_page.index.str.contains('%', na=False)]
    df_tableau_by_page = df_tableau_by_page[~df_tableau_by_page.index.str.contains('.php', na=False)]
    df_tableau_by_page = df_tableau_by_page[~df_tableau_by_page.index.str.contains('renfff', na=False)]
    df_tableau_by_page = df_tableau_by_page[~df_tableau_by_page.index.str.contains('.aspx', na=False)]
    df_tableau_by_page = df_tableau_by_page[~df_tableau_by_page.index.str.contains('localhost', na=False)]
    df_tableau_by_page = df_tableau_by_page[~df_tableau_by_page.index.str.contains('&', na=False)]
    df_tableau_by_page = df_tableau_by_page[~df_tableau_by_page.index.str.contains('_', na=False)]
    df_tableau_by_page = df_tableau_by_page[~df_tableau_by_page.index.str.contains('/api/', na=False)]
    df_tableau_by_page = df_tableau_by_page[~df_tableau_by_page.index.str.contains('webcache.', na=False)]
    df_tableau_by_page = df_tableau_by_page[~df_tableau_by_page.index.str.contains('www.bbb', na=False)]
    df_tableau_by_page = df_tableau_by_page[~df_tableau_by_page.index.str.contains('www.bbb.com/www.bbb.com', na=False)]
    df_tableau_by_page = df_tableau_by_page[~df_tableau_by_page.index.str.contains('bbbcom.bbb', na=False)]
    df_tableau_by_page = df_tableau_by_page[~df_tableau_by_page.index.str.contains('.html', na=False)]
    df_tableau_by_page.index = df_tableau_by_page.index.str.replace('www.bbb.com/', '/')
    df_tableau_by_page.index = df_tableau_by_page.index.str.replace('www.bbb.com', '/')  # remove 'www.bbb.com' from 'Page'
    df_tableau_by_page.rename(columns={'VISITs': 'Visits'}, inplace=True)
    df_tableau_by_page.index.name = 'Page'

    return df_tableau_by_page


def get_tableau_data_by_page(df_tableau_by_page):
    df_tableau_visits_by_page = df_tableau_by_page[['VISITS' in x for x in df_tableau_by_page['Measure Names']]]
    df_tableau_revenue_by_page = df_tableau_by_page[['Revenue' in x for x in df_tableau_by_page['Measure Names']]]
    df_tableau_epv_by_page = df_tableau_by_page[['EPV' in x for x in df_tableau_by_page['Measure Names']]]
    df_tableau_cr_by_page = df_tableau_by_page[['CR' in x for x in df_tableau_by_page['Measure Names']]]
    df_tableau_ctr_by_page = df_tableau_by_page[['CTR' in x for x in df_tableau_by_page['Measure Names']]]
    df_tableau_unique_clicks_by_page = df_tableau_by_page[['Unique Clicks' in x for x in df_tableau_by_page['Measure Names']]]

    df_merge_visits_by_page = pd.merge(df_tableau_visits_by_page, df_tableau_revenue_by_page, left_index=True, right_index=True)
    df_merge_epv_by_page = pd.merge(df_merge_visits_by_page, df_tableau_epv_by_page, left_index=True, right_index=True)
    df_merge_cr_by_page = pd.merge(df_merge_epv_by_page, df_tableau_cr_by_page, left_index=True, right_index=True)
    df_merge_ctr_by_page = pd.merge(df_merge_cr_by_page, df_tableau_ctr_by_page, left_index=True, right_index=True)
    df_merge_unique_clicks_by_page: DataFrame = pd.merge(df_merge_ctr_by_page, df_tableau_unique_clicks_by_page, left_index=True, right_index=True)
    df_merge_unique_clicks_by_page.drop(df_merge_unique_clicks_by_page.columns[[0, 2, 3, 5, 6, 8, 9, 11, 12, 14, 15, 17]], axis=1, inplace=True)  # drop redundant columns

    url = list(df_merge_unique_clicks_by_page.index.values)
    df_merge_unique_clicks_by_page = df_merge_unique_clicks_by_page.assign(Vertical=url)

    vertical = list(df_merge_unique_clicks_by_page['Vertical'].str.split('/', expand=True)[1])
    df_merge_unique_clicks_by_page = df_merge_unique_clicks_by_page.assign(Vertical=vertical)
    df_merge_unique_clicks_by_page['Weight'] = ''
    df_merge_unique_clicks_by_page['Weighted_avg_epv'] = ''

    df_merge_unique_clicks_by_page.columns = ['Visits', 'Revenue', 'EPV', 'CR', 'CTR', 'Unique Clicks', 'Vertical', 'Weight', 'Weighted_avg_epv']

    visit_values = df_merge_unique_clicks_by_page['Visits'].astype(int)
    df_merge_unique_clicks_by_page['Visits'] = visit_values

    epv_values = df_merge_unique_clicks_by_page['EPV'].values.astype(int)
    df_merge_unique_clicks_by_page['EPV'] = epv_values

    df_merge_unique_clicks_by_page['Weight'] = df_merge_unique_clicks_by_page['Visits'] * df_merge_unique_clicks_by_page['EPV']

    df_merge_unique_clicks_by_page['Weighted_avg_epv'] = df_merge_unique_clicks_by_page['Weight'] / sum(df_merge_unique_clicks_by_page['Visits'])

    print(df_merge_unique_clicks_by_page['Weight'])

    '''
    visit_values = df_merge_unique_clicks_by_page['Visits'].values
    visits = pd.to_numeric(visit_values, downcast='integer')

    epv_values = df_merge_unique_clicks_by_page['EPV'].values
    epv = pd.to_numeric(epv_values, downcast='integer')

    weighted_epv = list(visits * epv)
        '''

    df_merge_unique_clicks_by_page.to_csv('output\\views\\' + 'weight.csv', mode='w', header=True)
    return df_merge_unique_clicks_by_page


def get_page_type(df_merge_unique_clicks_by_page):
    df_tableau_page_type = df_merge_unique_clicks_by_page.assign(page_type='')
    path_superlist_charticle = 'data\\source-page-type-superlist-charticle.csv'
    superlist_charticle_column_names = ['Page', 'Page Type']

    df_superlist_charticle = pd.read_csv(path_superlist_charticle, names=superlist_charticle_column_names, index_col=0, header=0)

    superlist = df_superlist_charticle.loc[(df_superlist_charticle['Page Type'] == 'Superlist')]
    superlist_list = list(superlist.index.values)

    charticle = df_superlist_charticle.loc[(df_superlist_charticle['Page Type'] == 'Charticle')]
    charticle_list = list(charticle.index.values)

    for i in df_tableau_page_type.index:
        if str(i).startswith('/blog/'):
            df_tableau_page_type['page_type'][i] = 'Blog'
        elif '/article/' in str(i):
            df_tableau_page_type['page_type'][i] = 'Article'
        elif '/reviews/' in str(i):
            df_tableau_page_type['page_type'][i] = 'Review'
        elif str(i).endswith('/comparison') or str(i).endswith('-comparison'):
            df_tableau_page_type['page_type'][i] = 'PPC'
        elif str(i).endswith(tuple(superlist_list)):
            df_tableau_page_type['page_type'][i] = 'Superlist'
        elif str(i).endswith(tuple(charticle_list)):
            df_tableau_page_type['page_type'][i] = 'Charticle'
        else:
            df_tableau_page_type['page_type'][i] = 'Other'

    df_tableau_page_type.rename(columns={'page_type': 'Page Type'}, inplace=True)
    # df_tableau_page_type.to_csv('output\\views\\' + 'tableau-data-by-page.csv', mode='w', header=True)
    return df_tableau_page_type


def group_data_by_type_vertical(df_tableau_page_type):
    df_group_by_type = df_tableau_page_type.groupby(['Vertical', 'Page Type']).agg(Visits=('Visits', 'sum'),
        Revenue=('Revenue', 'sum'), EPV=('EPV', 'mean'), CR=('CR', 'mean'), CTR=('CTR', 'mean'), UniqueClicks=('Unique Clicks', 'sum'))

    df_group_by_type = df_group_by_type.fillna(method='ffill', axis=1)
    df_group_by_type.to_csv('output\\' + 'Vertical - Page Types.csv', mode='w', header=True)
    return df_group_by_type


def group_data_by_vertical(df_tableau_page_type):
    df_group_by_vertical = df_tableau_page_type.groupby('Vertical').agg(Visits=('Visits', 'sum'),
        Revenue=('Revenue', 'sum'), EPV=('EPV', 'mean'), CR=('CR', 'mean'), CTR=('CTR', 'mean'), UniqueClicks=('Unique Clicks', 'sum'))

    return df_group_by_vertical


def group_data_by_type(df_tableau_page_type):
    df_group_by_type = df_tableau_page_type.groupby('Page Type').agg(Visits=('Visits', 'sum'),
        Revenue=('Revenue', 'sum'), EPV=('EPV', 'mean'), CR=('CR', 'mean'), CTR=('CTR', 'mean'), UniqueClicks=('Unique Clicks', 'sum'))

    return df_group_by_type


def group_data_by_money_pages(df_tableau_page_type):
    df_group_by_money_pages = df_tableau_page_type.loc[df_tableau_page_type['Page Type'] == 'money_pages']

    return df_group_by_money_pages


def group_data_by_article(df_tableau_page_type):
    df_group_by_article = df_tableau_page_type.loc[df_tableau_page_type['Page Type'] == 'Article']

    return df_group_by_article

main.py

Bring data from the 3 other Python files (tableau.py, gsc.py, logs.py)
merge data
Output to an Excel file

import pandas as pd
import tableau
import logs
import gsc


def main():
    get_tableau()
    get_200_ok_logs()
    df_tableau = get_tableau()
    df_200_ok_logs = get_200_ok_logs()
    df_sgc = get_gsc()
    merge_data(df_tableau, df_200_ok_logs, df_sgc)


# bring Tableau data by page
def get_tableau():
    df_tableau = tableau.get_page_type(tableau.get_page_type(tableau.get_tableau_data_by_page(tableau.get_tableau_page_data())))
    return df_tableau


# bring 200 ok pages + G bot Hits from log files
def get_200_ok_logs():
    df_200_ok_logs = logs.group_log_files_by_page(logs.get_200_ok_status(logs.get_log_files_data()))
    return df_200_ok_logs


# bring SGC Position data
def get_gsc():
    df_sgc = gsc.group_gsc_by_page(gsc.get_gsc_data())
    return df_sgc


# Group Tableau data and log files data by Page
def merge_data(df_tableau, df_200_ok_logs, df_sgc):
    merge_tableau_logs = pd.merge(df_tableau, df_200_ok_logs, left_index=True, right_index=True, how='left')
    df_merge_tableau_logs_gsc = pd.merge(merge_tableau_logs, df_sgc, left_index=True, right_index=True, how='left')

    df_vertical = tableau.group_data_by_vertical(tableau.get_page_type(tableau.get_tableau_data_by_page(tableau.get_tableau_page_data())))
    df_page_type = tableau.group_data_by_type(tableau.get_page_type(tableau.get_tableau_data_by_page(tableau.get_tableau_page_data())))
    df_charticle = tableau.group_data_by_charticle(tableau.get_page_type(tableau.get_tableau_data_by_page(tableau.get_tableau_page_data())))
    df_article = tableau.group_data_by_article(tableau.get_page_type(tableau.get_tableau_data_by_page(tableau.get_tableau_page_data())))

    status_404 = logs.group_log_files_by_page(logs.get_404_log_files(logs.get_log_files_data()))
    df_404 = pd.merge(status_404, df_tableau, left_index=True, right_index=True, how='left')

    with pd.ExcelWriter('output\\' + 'seo-data.xlsx') as writer:
        df_merge_tableau_logs_gsc.to_excel(writer, header=True, sheet_name='Pages')
        df_vertical.to_excel(writer, header=True, sheet_name='Verticals')
        df_page_type.to_excel(writer, header=True, sheet_name='Page Types')
        df_charticle.to_excel(writer, header=True, sheet_name='Charticles')
        df_article.to_excel(writer, header=True, sheet_name='Articles')
        df_404.to_excel(writer, header=True, sheet_name='404 Pages')


if __name__ == '__main__': main()

Last Word

I will follow the success (or failure) of this strategy and those action items, and make sure to update in a few weeks’ time with the results.

If you have any suggestion on how to further automate this tool, or run a revenue optimization process, I’d love to hear about it.