Professional Documents
Culture Documents
MODERN
WEB ANALYTICS
RETHINKING MODERN WEB ANALYTICS
Table of Contents
1. The state of the web analytics landscape in 2021,
and what has changed ....................................................................2
1
CHAPTER 1
THE STATE OF
THE WEB ANALYTICS
LANDSCAPE IN 2021,
AND WHAT HAS
CHANGED
RETHINKING MODERN WEB ANALYTICS
3
RETHINKING MODERN WEB ANALYTICS
In the early 2000s, the web was a much simpler place. We had mostly static
websites with little interactivity. Javascript and CSS, the languages that give
the web it’s gloss and makes it a more engaging experience were far less
sophisticated than they are now. Not only was the technology underpinning
the web itself much more basic than today, the way people used the web was
much simpler. Almost nobody had what we would describe today as a
‘smartphone’, and even if they did, the wireless internet was very restrictive,
as were the websites you could visit on it (the iPhone wasn’t released until
2007). Most people accessed the web through a single computer – usually
one per household.
4
RETHINKING MODERN WEB ANALYTICS
Content consumption on the internet was almost exclusively text and image
based, primarily text. Connection speeds were nowhere near fast enough to
allow for reliable on demand music or video playback.
5
RETHINKING MODERN WEB ANALYTICS
Not only was it different from a user’s perspective, it was also different for
brands and businesses. A number of brands didn’t have any online presence
at all (certainly not on social media). Even if you did, your options for online
marketing and advertising were limited to static banner ads (paid for through
reserving a spot on a website for a finite period of time), email newsletters
and paid search advertising through Google (who had at least by this time
established themselves as the number one search engine).
6
RETHINKING MODERN WEB ANALYTICS
A web revolution
Over time however, the way people use the web has evolved significantly.
The iPhone was released in 2007, changing the way users browsed both on-
the-go and at home, as did, to a lesser extent, the iPad in 2010. Net
connections became exponentially faster and more reliable (both broadband
and mobile networks) which enabled video and music streaming on demand,
as well as live streaming.
7
RETHINKING MODERN WEB ANALYTICS
Web technologies and frameworks (such as React, Angular and Vue) have
been developed which enable web applications that were little more than
pipedreams in the early 2000s. You can buy more items online today than
you ever could before (cars, stocks/shares/options, groceries, ISAs etc) which
brings users to the web more frequently. Not only that, we can now manage
our finance through online and mobile banking, start relationships with
online dating apps and services, track our health with fitness apps etc. This is
the same if you are a business (handling finances through Xero or
Quickbooks, customer support through Zendesk, virtual events using
BrightTalk etc). We have more reasons to use the internet and the web than
ever before.
And since the variety of items people can use or buy online now is so vast,
and most people use the web on multiple devices, the customer journey is
now more complex than it has ever been. Research Online Purchase Offline
(ROPO) is very common nowadays. Starting your user journey on mobile after
seeing a dynamically targeted video ad on a social media app on your
commute before researching and purchasing on a desktop web browser at
home after receiving a triggered marketing email, and other user journeys
similar and more complex are commonplace today.
All of these societal and technological changes mean that the websites that
analysts are analyzing today look and behave very differently than they did
when websites were just static pages and a few buttons and forms.
And yet, the majority of the most popular web analytics tools out there today
still use a data model and frameworks as if we’re still analyzing simple
websites, with users that only have a single device.
8
RETHINKING MODERN WEB ANALYTICS
SHOP
ONLINE
VIEW VIEW
BANNER PRINT
AD AD
PURCHASE
DOWNLOAD
THROUGH
APP
CALL CENTRE
WATCH
BLOG YOUTUBE
AD
COMPARE
PURCHASE
SHOP
IN-STORE
ONLINE
POST READ
REVIEWS REVIEWS
PURCHASE LIKE ON
VIA MOBILE FACEBOOK
VIEW YOUTUBE AD
9
RETHINKING MODERN WEB ANALYTICS
Secondly, consider a web application that doesn’t fit nicely into the page
view -> session -> user framework, like twitter.com for instance. A user can
visit twitter.com/home, scroll through their timeline, hover over users’
avatars to see a profile card, like and retweet individual tweets, follow or
unfollow users all from this single page, that also auto-refreshes your feed for
you. This can all be performed across just one, single page view, in a
traditional sense, since the URL has not reloaded. If Twitter were using GA for
their web analytics, without extreme customization they would likely have a
high proportion of their “sessions” consisting of only a few page views and
high bounce rates. The standard data model enforced by most web analytics
tools don’t fit the web of today.
10
RETHINKING MODERN WEB ANALYTICS
The standardized page view -> session -> user paradigm doesn’t fit a lot of
web experiences in 2021. The BMW car customiser, an online learning
provider like Udemy or a streaming service like Twitch are all web
applications for which the standard web analytics data model makes no
sense anymore.
It’s worth noting that there are a number of websites out there that do still fit
this model – most publisher and ecommerce websites generally do fit, for
instance, since users move from product pages, to search results pages and
checkout and confirmation pages, or users read articles. For those businesses
out there, the model still fits well. However, a growing number of businesses
do not fit this model, and even publishers and ecommerce businesses are
starting to change their web experiences away from what we might call a
“traditional” website model.
11
RETHINKING MODERN WEB ANALYTICS
Overall, many modern web analytics tools are generally poorly equipped to
provide the deep level of detail required to understand user behaviour across
complex web user journeys. This is not a revelation or an unpopular opinion.
This challenge has been picked up by the largest player in the market, in an
attempt to help analysts better answer those questions that traditional web
analytics tools struggle with.
12
RETHINKING MODERN WEB ANALYTICS
This major change from Google shows they acknowledge the need for a new
look at web analytics, and the fact that GA4 is based on Firebase which was a
tool for tracking interactions on mobile apps shows how Google sees the two
worlds coming closer together.
13
RETHINKING MODERN WEB ANALYTICS
14
RETHINKING MODERN WEB ANALYTICS
To summarise, web analytics tools have struggled to keep pace with the
changing user behaviors and technological advancements that have
happened over the last 10-15 years. Web analytics solutions need to provide
businesses and analysts with the ability to customise their tracking and data
models to truly fit their web applications that their customers use, to fully
understand the behaviour those users are exhibiting. Without this
understanding that comes from rich and detailed behavioural data businesses
cannot expect to provide the best user experience across all touchpoints.
In the upcoming chapters in this eBook on web analytics, we will cover some
of the big topics and challenges that need to be addressed in order to go to
the next level and gain the most value from your behavioural web data:
15
CHAPTER 2
PRIVACY UPDATES,
AD BLOCKERS, AND
THE NEED FOR
1ST-PARTY TRACKING
RETHINKING MODERN WEB ANALYTICS
17
RETHINKING MODERN WEB ANALYTICS
18
RETHINKING MODERN WEB ANALYTICS
Private Browsing
Web browsers are implementing clever features designed to both protect
users while preventing websites from tracking those visitors. Behind the
scenes, browsers are removing tracking parameters from URLs, stripping or
spoofing referral IDs, and setting strict limits on how websites can interact
with a user’s browser storage via cookies.
Cookies
As a refresher, cookies are essentially bits of code that use browser storage to
maintain specific states as a visitor navigates from one page to the next. Cookies
make sure your visitors stay logged in and keep their items in the shopping cart
as they browse your website. Cookies are often referred to as first or third party,
but it’s more accurate to describe their context, the circumstances under which
the cookie was written to a visitor’s browser. From Cookie Status:
19
RETHINKING MODERN WEB ANALYTICS
Here, “eTLD+1” refers to the effective top-level domain plus one part. For
example, blog.snowplowanalytics.com is an eTLD+1 for the domain
snowplowanalytics.com. Cookies with a first-party context (“first-party”
cookies) occur between pages that share an eTLD+1, e.g. navigating from
blog.snowplowanalytics.com/post_1 to blog.snowplowanalytics.com/post_2.
Third-party context occurs between pages that don’t share a domain, like your
email service provider’s subscription form popping up in an iframe or a
restaurant’s menu PDF being served directly from S3 via s3.amazonaws.com.
20
RETHINKING MODERN WEB ANALYTICS
This means if someone visits your website to browse your products and
comes back ten days later and makes a purchase, that second visit looks like
a new person to your analytics.
21
RETHINKING MODERN WEB ANALYTICS
ETP
Mozilla introduced Enhanced Tracking Protection into its Firefox browser in
2018 and enabled the privacy-focused suite of features by default in 2019.
Similar to ITP, ETP blocks third-party cookies. As of version 2.0, Firefox
deletes tracking cookies every 24 hours, as opposed to Apple’s generous
seven days. ETP extends a grace period for websites you visit frequently, like
search engines or social media, storing those first-party cookies for 45 days
(or indefinitely, depending on how often you visit the site).
Ad blockers
Even if you don’t advertise, ad blockers may be having a significant impact
on your web analytics. Ad blockers function like other tracking prevention, by
checking scripts as a page loads against a list of domains to block. Depending
on the implementation and the ad blocker, tracking scripts from Google
Analytics or other on-page analytics platforms can be caught by the filters.
22
RETHINKING MODERN WEB ANALYTICS
23
RETHINKING MODERN WEB ANALYTICS
Your analytics will have distortions and gaps if you rely on cookies set in a
third-party context or by many known tracking and analytics services. First-
party data collection platforms like Snowplow use server side set cookies
(first-party context), leaving them unaffected by ITP, ETP, or most other
tracking prevention. In an experiment run by Moz, calculating traffic
obscured by ad blockers or browser tracking prevention revealed anywhere
from a 5-30% discrepancy in volume.
Without being limited by expiration dates, tracking using server side set
cookies provides a source of rich, detailed behavioral data for businesses to
use to make more informed decisions. Just as important, setting cookies this
way preserves user privacy. Server side set cookies are currently the most
reliable way to track anonymous visitors to your website.
24
RETHINKING MODERN WEB ANALYTICS
25
CHAPTER 3
BUILDING A WEB
ANALYTICS STACK
– PACKAGED
VS MODULAR
RETHINKING MODERN WEB ANALYTICS
According to a traffic usage survey, in 2008 Google Analytics was used by 55.1%
of all the websites, amounting to a tool market share of 84.3%. Despite many
new players entering the industry, today GA has managed to hold onto their
dominance with an eye-watering 75% of the market. It’s fair to say that GA
continues to be the go-to tool for web analytics, and for many organizations it
is a hugely powerful solution that helps them get started quickly.
27
RETHINKING MODERN WEB ANALYTICS
But despite the popularity of tools like Google Analytics (and other packaged
tools), there are a number of challenges organizations run into when only
relying on packaged tools for their web analytics. From browser privacy
challenges, data silos, and lack of control, it’s worth exploring what these
challenges mean on a practical level to your business , and why a move to a
more modular stack could be a better approach in the long term.
That being said, packaged tools are popular for a reason. It wouldn’t be fair –
or accurate – to say that all companies should ignore packaged analytics
solutions, and for many teams starting out on their data journey, packaged
tools offer distinct advantages.
Early in the data maturity journey, it’s often not wise or necessary to build out
a complex technology stack. This is where packaged analytics can shine.
They are quick to set up and get going. A huge advantage of packaged tools is
that they deliver value quickly. They can give you a quick understanding of
how users are interacting with your websites and platforms, while you can
always build out a wider set of use cases later.
28
RETHINKING MODERN WEB ANALYTICS
They offer an all-in-one solution. Packaged analytics tools are exactly that
– a package, which means data collection, modeling and visualization are all
included. This eliminates the need to hunt down and purchase multiple
solutions, which is particularly advantageous at an early stage when
resources are limited.
They’re easy to use. While this may not be a major benefit to data teams or
engineers, marketing teams and other internal data consumers can easily
self-serve data from packaged analytics tools like Google Analytics, without
being SQL proficient.
However, for all their advantages and simplicity, packaged solutions have
their drawbacks.
29
RETHINKING MODERN WEB ANALYTICS
It can be tempting to stay with the analytics tools you’ve grown used to. The
risk here is that you’re not fulfilling the potential of one of your greatest
business assets: your behavioral data. Packaged analytics solutions are
limited in the following ways;
This can be especially problematic for organizations that do not fit the mould
of the typical e-commerce transaction, such as jobs boards or marketplaces
with multiple users.
30
RETHINKING MODERN WEB ANALYTICS
• They are black boxes. Let’s imagine you’ve set up your packaged
analytics tool, and you’re beginning to explore data about your web
visitors. For some of your web pages, your bounce rate looks pretty
high, why is that?
At this point, you have no control over how ‘bounce rate’, ‘time spent’
or other important web metrics are recorded. You don’t even know
how your data is captured and processed, where is it hosted? What
logic goes into defining certain events?
Since packaged tools are closed off, you cannot look under the hood and
discover (let alone change) the way your web data is being manipulated. It’s
also often difficult (or impossible without paying large fees) to obtain and
work with the raw data, before it becomes opinionated and modeled. For
organizations beginning to recognize data as one of their most important
assets, this is a red flag. It means you’re handing over control and ownership
of your valuable behavioral data to a third party.
31
RETHINKING MODERN WEB ANALYTICS
• They are siloed. Many companies are realizing the strategic benefits to
building a single customer view. That is to say, unifying data sets from all
your platforms and channels to construct a cohesive understanding of
your users.
Homegrown
But with packaged analytics tools, it’s extremely difficult to unify data
in this way, because your data is siloed off and structured completely
differently to data captured from, say, social media, CRM, and other
channels, Without the ability to structure the data the way you’d like,
or access to the raw data, your data is stuck in your packaged analytics
tools where its value is limited to only a few use cases, perhaps just
reporting and analysis. Which brings us to our next drawback.
• They are limiting. The way companies work with data is constantly
evolving. We’ve seen companies like Spotify use behavioral data to give
their listeners unique experiences such as their weekly recommended
playlists. There are now a number of game-changing use cases that can
be achieved with behavioral data, from personalized content to
product analytics and customer journey mapping and the list is growing.
32
RETHINKING MODERN WEB ANALYTICS
33
RETHINKING MODERN WEB ANALYTICS
To get there, you will need to consider how to shape your end-to-end data
infrastructure, from data capture, to modeling and transformation, to
warehousing/storage, visualization and more. It will require investigation
into a number of different options, and evaluating the choices between
building, buying or running open source versions of the best-in-class solutions.
Your data team will likely lead the charge towards building a future proof
data stack. But that doesn’t mean they should build all their own solutions.
There is a growing market of cutting edge technologies for web analytics (and
wider use cases) for you to explore.
34
RETHINKING MODERN WEB ANALYTICS
We’ll cover more on the best tools for your web analytics in an upcoming
chapter of this eBook but for now, here are some key categories to consider
when putting together your stack:
• Data Visualization
To provide your internal data consumers with the best insights, you’ll need a
solution for visualizing and exploring the data. Look out for tools that make it
possible for teams to self-serve data, without creating bottlenecks.
• Data Monitoring
Measuring and improving your data quality is a huge factor in getting the most
from your web data. These tools will help you build assurance in your web data,
so your internal teams can be confident their data is reliable and trustworthy.
35
RETHINKING MODERN WEB ANALYTICS
• Tag Management
Tag management systems or ‘TMS’s are at the heart of your web analytics
and marketing. They are especially important when it comes to setting
cookies, capturing key information about your users and visitors (while
respecting their privacy). Consider a TMS that allows for server-side tagging
and one which is compatible with your other technologies.
• Testing/Debugging
Testing your web analytics stack for tracking failures is not the most exciting
aspect of your stack, but it’s one of the most important. We recommend
integrating tracking as part of your automated testing suites, so you can
ensure your new builds don’t ship without properly functioning trackers
ready to go.
• Data Transformation
Transforming, reformatting or modeling your data are all essential to ensuring
your internal teams can action the data set that is most relevant to them.
A good data transformation tool will enable you to turn raw data into actionable
data sets that are understood and trusted by cross functional teams.
36
RETHINKING MODERN WEB ANALYTICS
If you’re unsure where to start, our internal experts can help you identify your
immediate needs and scope out how you can realize your ambitions with
behavioral data. It’s worth remembering that your organizations’ experience
with data is a journey – there is nothing wrong with starting small, and
building as you grow.
37
CHAPTER 4
THE BEST IN
CLASS TOOLS FOR
WEB ANALYTICS
RETHINKING MODERN WEB ANALYTICS
Building a data stack like this opens up opportunities to do more with your data.
But it isn’t easy. It means finding, researching and evaluating a number of
vendors to find the tools that work best for your business.
To make it easier, we’ve compiled a list of key tools to consider when building
out your web analytics stack. It’s not exhaustive, but a combination of these
solutions will put you in a good place for leveraging behavioral data from
web and other sources.
39
RETHINKING MODERN WEB ANALYTICS
Data warehouse
One of the best ways to start making the most of your web and behavioral
data is to load it into a data warehouse. This allows analysts to not only slice-
and-dice the data in any way they wish, but it will also scale up with the data
volume increases over time. The best data warehouses also have great
marquee features such as integrations into other analytical products and
services, and extra capabilities such as ML or querying semi-structured data
(such as JSON data).
(Check out this post from Poplin to see how the major data warehouse
solutions compare.)
40
RETHINKING MODERN WEB ANALYTICS
Redshift
Redshift is what started the popularity of cloud hosted data warehouses,
launching in 2013. It's ease of use and low cost (compared to popular on-
prem solutions available at the time) drove huge adoption of Amazon's data
warehouse. It has struggled somewhat in recent years to keep up with the
innovation of its competitors, but with new RA3 cluster types (which separate
storage and compute, which had previously been tightly coupled together)
and recent feature announcements such as Redshift ML and the SUPER data
type (with fuller JSON support than ever) are making Redshift a more
appealing choice again. Tight integration with AWS services (such as S3,
Sagemaker and Glue) and reserve pricing for predictable cost forecasting are
also big selling points.
BigQuery
Google's cloud data warehouse (which was developed for internal use for a
long time to analyze Google's search index) now is available as a pay-as-you-
go web service (DWaaS). With great integrations into the rest of GCP (Google
Dataflow, Google Cloud Storage, Google Cloud ML etc) as well as the Google
marketing stack (Google Ads/Search Ads 360, Doubleclick, Ads Data Hub etc),
BigQuery is great service to act as the center of all your marketing and
customer data efforts. It also has good support for nested or repeated JSON
records, supports real-time ingestion (through Streaming Inserts) and even
has support for running ML algorithms with BQML.
41
RETHINKING MODERN WEB ANALYTICS
Snowflake
Snowflake is a cloud data warehouse with some very powerful and unique
features, available on all 3 of the big cloud platforms. It separates storage
and compute (similarly to BigQuery) but allows further control by having
separate Virtual Warehouses, which can all be different sizes and suited for
different purposes. Since the data is stored separately from these Virtual
Warehouses, this means Snowflake is probably the most scalable of all
commercially available data warehouses on the market, and we see our
customers with highest volumes generally moving to Snowflake. Snowflake
also has excellent support for semi-structured JSON or XML data through its
VARIANT data type – meaning Snowflake can also act as a data lake,
popularizing the Data LakeHouse framework.
42
RETHINKING MODERN WEB ANALYTICS
Data Visualization
For most users, staring at a large and
unwieldy table of numbers can be
daunting and hard to understand. In
order to relay insights and findings
to other stakeholders in the
business, your web analytics stack
needs good visualization capabilities
Looker
Google's enterprise BI tool is aimed at companies who want to enable self-
serve analytics across their organisation. Its proprietary data modelling
syntax LookML allows for analysts to define a metric once and let it be used
by all end users throughout the business. It's specifically designed for cloud
data warehouses and takes advantage of their performance. Currently
considered best in class, though it does leave something to be desired when
it comes to the flexibility in terms of different visualizations it can do.
43
RETHINKING MODERN WEB ANALYTICS
Tableau
One of the major players of the BI space for a number of years, Tableau is
enterprise-ready and leads the industry in its capabilities for drag-and-drop
visualization building. Tableau is the most capable in the space for creating
custom visualizations, and since it is a low-code to no-code approach it's
generally very easy for traditional BI analysts to use. Tableau leans heavily on
a legacy approach of loading Tableau data exports onto its own servers to
power its dashboards, but is rolling out new features to enable more cloud
native approaches to data visualization.
Power BI
Built on the Microsoft BI stack that has been popular for decades, this
Windows only BI tool is popular with Excel analysts. Despite this, it also has
powerful data modelling capabilities (through Power Query and its data
modelling language M), and is flexible enough to work with the popular cloud
and on-prem data warehouses. A very affordable price tag also makes this a
good choice if you want to start small and scale up.
Holistics
While they haven’t been around in the BI market for long, Holistics offers a
powerful combination of data governance, ELT/transformation and
visualization capabilities in a single attractive product. Entirely web based,
this service is built from the ground up for the cloud, and utilizes the
performance of cloud data warehouses to ensure speedy dashboards. This is
a great tool if you're looking at a modern, all-in-one, cloud native
dashboarding and BI solution.
44
RETHINKING WEB ANALYTICS FOR THE MODERN AGE
Data Monitoring
With the increasingly large volume
and diversity of data flowing through
your website and into your points of analysis,
it's more important than ever to monitor your
data quality at every stage. These tools check and alert on
your data quality across various points of your data lifecycle.
Observe Point
A great tool for running automated scans on your website(s) to audit and
monitor your tagging set up. By default will crawl every page and log every
tag that fires on that page, but custom user journeys can be added (such as
checkout flows, product interactions etc) and it will alert if at any point tags
stop firing or start firing incorrect or unexpected values. An enterprise level
piece of software, with a price tag to match
Iteratively
Iteratively helps teams catch analytics bugs before they hit production so you
don’t have to worry about bad data downstream. The product consists of two
parts: an intuitive web app where analysts, PMs and marketers can create
and evolve their tracking plan (ditching their spreadsheets), and developer
tooling for engineers to quickly and easily instrument tracking with type
safety and auto-complete. They work hand-in-hand to ensure event tracking
is implemented accurately and that the tracking plan is always enforced.
Great Expectations
Great Expectations is an open-source framework that allows for automated
tests run against your data in your data pipeline. From simple tests such as
checking a column for unique values to more complex assertions, such as
seeing if a value is within 2 standard deviations of the median value for the
entire column. GE can run all sorts of tests on your data as it is ingested and
transformed. We use it at Snowplow in our latest V1 data models for
BigQuery, Redshift and Snowflake.
45
RETHINKING WEB ANALYTICS FOR THE MODERN AGE
Tag Management
Deploying tracking to your website
is central to your data collection,
data quality and data privacy
strategies. Tag management
systems make it more
straightforward to do this at scale,
and with the flexibility required to
track all customer interactions.
Tealium
Tealium's enterprise tag management system is aimed at organizations that
want more high-end features, such as granular access controls and
deployment workflows and a more developer friendly experience. It also
integrates into Tealium's CDP product.
Adobe Launch
Formerly known as DTM, this is the go-to choice if your infrastructure sits in
the Adobe ecosystem – Adobe Analytics, Adobe Target, Adobe Experience
Manager, and so on.
46
RETHINKING WEB ANALYTICS FOR THE MODERN AGE
Testing/Debugging
When debugging any web
implementation, it's
important to be able to
see what the browser is
doing and what data it is sending where and when. These Chrome extensions
cover the dataLayer, common web analytics solutions, and help spot common
installation issues, as well as allowing you to see if the data being sent is
correct. This should be included both during implementation (before
publishing to production) and when investigating any issues.
AnalysisTools
Beyond visualizing your
behavioural data (in dashboards
and reports), there are higher-level
analyses you may want to run over
your data. BI tools and dashboarding solutions struggle to perform statistical
analysis such as predictive models, forecasts and dynamic segmentation
models. These are a couple of programming languages and packages aimed
specifically at data scientists and statisticians to get you started.
47
RETHINKING MODERN WEB ANALYTICS
Data Transformation
In order to perform any analysis or generate
any reports, your data will need preparing.
Transforming your data in a modern cloud
data warehouse is a great way to do this, as it is
performant, cost effective and can easily scale up with your data volumes.
There are some great tools available to orchestrate this in-warehouse pipeline.
Dataform - BigQuery
Dataform has recently been acquired by Google Cloud, and is now focusing
on BigQuery specifically. Built on Typescript and Node.js, Dataform works
most entirely in the browser (though there is an OS CLI tool) which provides
instant compilation, automatic dependency inference, custom Javascript
functions for repeating common tasks and scheduling to run your ELT
pipelines inside BigQuery. It is also likely to get a lot focus and development
from Google Cloud in the coming years.
48
RETHINKING MODERN WEB ANALYTICS
Data management
Snowplow
Snowplow is the leading platform for
behavioral data management, including
web data. For data teams looking to get
more from their behavioral data, Snowplow
offers unrivalled control and flexibility over
your data set, as well as complete ownership
of your raw, unopinionated data.
While this list isn’t exhaustive, we hope it helps to get you started on your
journey to a more complete stack for web analytics. Once in place, your data
stack should evolve with your business, setting you up for success for near-
term goals, as well as for future aspirations. For this reason, although it takes
time, effort and investment to piece together a stack that’s effective for
modern web analytics, the hard work will be worth it in the long run.
49
CHAPTER 5
REDEFINING
WEB ANALYTICS
METRICS
RETHINKING MODERN WEB ANALYTICS
Most web analysts have used the same common metrics to measure website
performance for a number of years. Some of these metrics include:
• Conversion rate
• Bounce rate
• Time on page/Session duration
51
RETHINKING MODERN WEB ANALYTICS
Most of these metrics are provided to users by a packaged tool like Google
Analytics. Most of these out-of-the-box metrics are designed to make it easy
to understand if our site is effective at converting users, or they are finding
our pages and content engaging – and they are generally all understood to
work as follows:
• Conversion rate - the higher the better
• Bounce rate - the lower the better
• Time on page/Session duration - the higher the better
52
RETHINKING MODERN WEB ANALYTICS
Conversion Rate
Conversion rate is designed to indicate how well a website is performing in
terms of pushing a user through a desired journey towards a desired
conversion – like purchasing a product, signing up for a demo or requesting a
call from a sales team.
This is a noble aim, and making the user journey a more enjoyable
experience for the end user is always a worthwhile effort.
However, the metric itself has problems. The first thing to be aware of is
Goodhart’s Law which essentially states that focusing on a single metric like
this can have unintended side effects.
53
RETHINKING MODERN WEB ANALYTICS
This seems like a misguided approach, as there are numerous ways a brand
can add value to a user who is not ready to buy just yet, which may turn them
into a customer later on - creating helpful informational content, providing
honest comparisons, buying guides, nurturing the user journey until they are
ready to purchase - at which point they will be much more likely to visit the site
that helped them make their mind up to complete the purchase or conversion,
and as a result, more likely to stay a customer and become an advocate.
There are also more concrete technical problems with conversion rate. The
biggest issue is that most of the time the conversion rate metric is based on
visits or sessions – total number of identified conversions divided by the total
number of web visits.
Awareness
Consideration
Conversion
54
RETHINKING MODERN WEB ANALYTICS
As explained above, the user may be on a long and complex multi-visit user
journey, but not quite ready to convert right now. A session based conversion
rate would count this visit as a non-converting session, and therefore
negatively count towards your conversion rate. But yet, this very user could
convert in the future, but the majority of their visits will be discounted, as if
they are “bad” sessions, pushing down the conversion rate, and suggesting
the website is not performing well.
There are ways to identify authenticated users (who login to your site and
self-identify) and do so across devices. But this is generally a minority of
users, so isn’t a viable option for most businesses.
That isn’t to say that businesses should not concern themselves with their
conversion rates, but instead to ensure that they look at the conversion rate
metric within the right context, and not be blinded by it. Conversion rates can
be useful for visits with a commercial user intent (visits where the landing
page is a product page, or PPC traffic from branded keywords for instance)
but are less helpful when intent is informational (landing on a content page
from organic search) or when the likelihood to convert is low (potentially
visits from a mobile device when the product is of a very high value).
55
RETHINKING MODERN WEB ANALYTICS
Bounce Rate
Bounce rate was once described by Avinash Kaushik as the “I came, I puked, I
left” metric back in 2007. It is supposed to signify the amount of your users
who landed on your site and quickly decided that your site was not what they
were looking for and left instantly.
While this is technically true, many analysts and users focus on this metric to
measure how a landing page is performing, even though bounce rate has
been largely criticized by the wider analytics industry.
56
RETHINKING MODERN WEB ANALYTICS
The problems start to occur when we take this as our definition of a bounced
session. Under this definition, if there is no other tracking set up to track
interactions on the page, it is possible for a “bounced” session to actually be
a very valuable session.
They land on the content, spend time on the page, scroll down the page to the
end and read the content in full. They may even bookmark the page, or copy
the link and send it to their friends or colleagues depending on the exact type
of content (not all content is inherently sharable). Having read the content,
they’re happy they’ve got what they need, and close the browser tab. This is
likely to have been counted as a bounced session, and thus contributes
towards increasing the site’s overall bounce rate. And since a higher bounce
rate is generally considered to be a bad thing, this is therefore a “bad” visit –
whereas in reality this visit was a good visit, as the content answered the
user's question and gave them the answer they were looking for.
57
RETHINKING MODERN WEB ANALYTICS
Another example could be to visit a retail site showing the location and
opening times of a physical store. The user gathers all the information they
need quickly, and then closes the window. Again, the page has served its
purpose perfectly, but still generates a bounced session: therefore, another
“bad” visit.
Analysts have seen this happen, and know that a “high” bounce rate is “bad”.
As a result, sometimes there is a metric known as “adjusted bounce rate”.
This is where the tracking implementation is tweaked in order to bring
bounce rate down. For instance, if the user stays on the page for more than
30 seconds for instance, then don’t treat the session as a bounce, even if they
then leave. This practice of tweaking or “fixing” the metric is generally not a
good idea, as you are not addressing any underlying cause of the metric,
rather you are focusing on the metric itself and ignoring the real issue. This
means creating content to better fit the user’s intent or optimizing the page
for a better experience for the user.
Bounce rate is a useful metric when used appropriately. A good use of bounce
rate would be to look across all similar pages (all the content pages within a
/blog/ section of a site for example) and compare the bounce rate across all
of these pages. If the majority of these pages all have roughly similar bounce
rates, but one or 2 have a significantly lower bounce rate, then it’s worth
looking into these pages to understand why. This insight could prove
valuable when creating future content.
Conversely, if a few pages have a significantly higher bounce rate, then this
should be looked to be understood as well. Always make sure you’re doing a
fair comparison, you understand the user intent behind those pages and how
users got there, and make sure to segment, segment, segment.
58
RETHINKING MODERN WEB ANALYTICS
However, as you may have guessed, there are both conceptual and technical
problems with measuring time on page/site. The first point to consider is this:
Does a higher time on site really mean that the users are more engaged with
the content or the site? It is true that if a user enjoys the content on a site,
that they may spend more time reading the articles and potentially browsing
to other articles or pages and reading them.
59
RETHINKING MODERN WEB ANALYTICS
The problem is that a user who is not enjoying the content or is struggling to
use the site could also spend more time on the site. What if the user interface
is confusing and the user can’t navigate the site easily? This is likely to mean
they will spend longer on the site as well. Or a user who is struggling to read
the content because it is too complicated to follow or poorly written? There’s
no real way to differentiate between these two very different types of user
experience just by examining the time spent on the site.
There are also issues from a technical standpoint. Most tools that measure
the time spent on a page or on the site use the difference in timestamps
between page views and other page views or other events. The problem with
this is that these don’t account for whether the user was actually at their
screen. If you view a page, read for 20 seconds and leave your screen to make
a coffee for 10 minutes and return before the session times out, it is likely that
the tool will assume you’ve spent those 10 minutes looking at the page, even
when you weren’t.
Snowplow handles this by using Page Ping events, where the Javascript
tracker “pings” the page to see if the user is still active. If not, then this time is
not taken into account when calculating how long was spent on the page.
60
RETHINKING MODERN WEB ANALYTICS
61
RETHINKING MODERN WEB ANALYTICS
The first thing to say is that these metrics aren’t always the wrong thing to
use. It’s just important to understand how they are measured and calculated,
so that given your unique case you are able to make a call as to whether
these metrics are appropriate or not.
Given this, are there alternatives to these common metrics that can be used
in their place? Or different applications of these metrics that will make them
more meaningful?
Session and user based conversion rates all suffer from the somewhat
flimsy definition of “sessions” (a collection of hits with a timeout window,
as well as other factors) and “users” (a unique cookie ID, unique to a
browser/device combination). A way to look at making conversion rate more
meaningful is to use it in the context of other important events (sometimes
called micro conversions).
62
RETHINKING MODERN WEB ANALYTICS
While this helps make the metric somewhat more reliable, the most
important thing to change is your mindset when analyzing the data. Make
sure to take into account things such as how long the content is, what the
user intent was when they landed on the page (whether from a search
engine, a social media post, a referring site or an ad), any multimedia content
(video or audio etc) which might change the user’s behaviour etc. Once these
factors are considered, you are in a better position to interpret what a
particular metric might be indicating.
The ultimate aim is to create metrics that are completely custom to your site
or product, and limit your use of “standard” metrics to all but the most top
level of analyses. This takes a deep level of understanding of your site, your
users and their user journeys. But once you have these higher value,
customised metrics that are much more meaningful, drawing insights from
your data becomes much easier.
63
CHAPTER 6
DATA MODELING
FOR WEB ANALYTICS
RETHINKING MODERN WEB ANALYTICS
Viewing data modeling in this light leads to the realization that data models
are data products that should be considered to be analogous to any other
tech asset or software product that adds significant value to an organization.
65
RETHINKING MODERN WEB ANALYTICS
All of these different types of users and more will be performing both
expected and unexpected actions on your website or application. Much of
the noise is filtered out as it is never tracked, but even with the best tracking
design and protocols part of this noise can end up in the final dataset.
It is the job of the data model owner to understand what noise is inherent in the
final dataset and to make decisions around what should be done as a result.
66
RETHINKING MODERN WEB ANALYTICS
1. Aggregation
Event level data can be difficult for the casual user to understand as the
concept of an event is relatively abstract, aggregating up to more familiar
higher order concepts can help to promote understanding of the data. For
example if a user wanted to understand in-session behaviour over time they
might be more inclined to query a ‘sessions’ table over an ‘events’ table.
67
RETHINKING MODERN WEB ANALYTICS
2. Filtration
In the simplest case, if upfront validation is not performed on event-level
data before it lands in the data warehouse then poor quality data will need to
be filtered out of the final dataset. An example could be an anomalous
transaction event not being filtered out of a marketing attribution model
resulting in inefficient allocation of marketing spend.
The events table contains a large number of fields – only the fields that are
relevant to a particular data model table should be selected, this helps to
reduce the signal to noise ratio for any downstream analysis.
By default internal users and bots should be removed from any final
analytics dataset. Your SQL data model is the ideal place to define what
constitutes an internal user or bot (often this decision is informed by the
Snowplow IAB enrichment).
The Snowplow tracker comes with built-in semantics to ensure that every
event is sent at least once, this will inevitably result in a small number of
duplicate events in the final dataset, these duplicate events should be
filtered out prior to data consumption.
68
RETHINKING MODERN WEB ANALYTICS
69
RETHINKING MODERN WEB ANALYTICS
70
RETHINKING MODERN WEB ANALYTICS
71
RETHINKING MODERN WEB ANALYTICS
FROM atomic.events AS ev
, time_in_session AS(
SELECT domain_sessionid AS session_id
, COUNT(DISTINCT event_id) * 10 AS time_in_session
FROM atomic.events
GROUP BY 1
)
, pre_agg AS(
SELECT pv.session_id
, CASE WHEN s.page_views_in_session = 1
AND s.first_page_visited = 'homepage' THEN 'bounce'
WHEN s.page_views_in_session = 1
AND s.first_page_visited = 'how-to-guide-article’'
AND tis.time_in_session < 60 THEN 'bounce'
ELSE 'quality_session'
END AS bounce
FROM sessions AS s
SELECT bounce
, count(1)
FROM pre_agg
GROUP BY 1
72
RETHINKING MODERN WEB ANALYTICS
This is a relatively complex query that has multiple CTEs, each one querying a
different event type and applying different operations to the event level data.
These CTEs then have to be joined together into a final table that contains a
case statement that is required to classify sessions as either quality sessions
or bounced sessions.
There are a multitude of problems with this query, including but not limited to:
- The level of SQL for this basic analysis is too advanced,
non SQL fluent users would not be able to build such a query;
- Introducing multiple steps into a query means there are more
places where mistakes can be made;
- The query is not optimized and contains expensive and slow
to run window functions;
- There is no version control on the case statement, every user
who wants to analyze bounce rate has to have knowledge of
where to find the latest version of the case statement in order
to perform similar analysis self sufficiently;
- The query directly queries the events table meaning it
unnecessarily scans a large amount of data every time it runs.
The end result of all of this is that any reporting or visualisation that is based on
querying only event level data is likely to result in a very difficult to maintain
and likely very expensive reporting setup. A better approach is needed.
This better approach would be to codify all of this logic in a central, versioned
data model that might allow for the following query:
SELECT bounce
, count(1) AS sessions
FROM derived.sessions
WHERE (session_start_date) = '2021-03-18'
GROUP BY 1
Which allows the user to use simple SQL or even a drag and drop tool to
calculate bounce rate for a specific date or date range with minimal effort.
73
RETHINKING MODERN WEB ANALYTICS
Each box in this diagram represents a table in the data warehouse. The events
table contains our immutable unopinionated event log. Each table is dependent
on the table below it, and data is aggregated and filtered incrementally in the
operations that take place between each step of the model.
This is a preparatory data model that contains core business opinion such as
what marketing parameters constitute what channel, what constitutes a
bounced session, and what constitutes a conversion.
This data model is a good starting point for an organization that tracks web
data only. But for any organization that has customer touchpoints outside of
the web it is extremely valuable to integrate these into the data model and
build a single customer view. An example of this is provided below where a
mobile data model analogous to the web data model has been created and
the results have been unified to build such a single customer view.
74
RETHINKING MODERN WEB ANALYTICS
single_customer_view
web_sessions mobile_sessions
page_views screen_views
events
Single customer views like this that capture customer touchpoints across a
variety of media are hugely valuable due to the unparalleled insight they can
offer into customer behavior.
For example, any attribution model that is built to combine both web and
mobile touchpoints and capture the whole customer journey will be orders of
magnitude better than an attribution model that is only able to attribute
marketing credit to single device customer journeys.
75
RETHINKING MODERN WEB ANALYTICS
76
RETHINKING MODERN WEB ANALYTICS
Investing in your strategic data asset is one of the best things you can do to
build a competitive advantage in today’s competitive landscape. In our next
chapter, we’ll explore how Snowplow can help organizations take advantage
of their behavioral data from web.
78
CHAPTER 7
HOW SNOWPLOW
CAN HELP YOUR
WEB ANALYTICS
RETHINKING MODERN WEB ANALYTICS
Web analytics has become a more vibrant, fractured and challenging industry
in recent years. From humble beginnings, websites have evolved out of static
web pages into compelling web experiences. They can now host game
changing features such as personalization, dynamic pricing and content
recommendations to make browsing a richer, more rewarding experience. And
the teams behind them: developers, product teams, data teams and engineers
are laser-focused on understanding the user experience at a granular level, in
order to make incremental improvements on a constant basis.
And getting this data is another huge challenge. In part, this challenge is a
logistical one. It requires a data team to establish a successful data
management practice that will make the most of the data. It requires a suite
of tools that will take the data on a journey from the point of capture, to
enrichment, modeling, storage, to visualization and reporting. It also requires
a significant investment, not just in terms of cost and effort, but also a unified
internal effort to align data objectives with the wider business and forge a
culture of data excellence across the organization.
79
RETHINKING MODERN WEB ANALYTICS
80
RETHINKING MODERN WEB ANALYTICS
In part, this is because our tooling has not evolved at the same pace.
Packaged tools helped us get started with web analytics, and at their best,
they can help us get off the ground at the start of our data journey. But as
businesses grow and our reliance on data increases, the limitations of these
tools prove costly and frustrating.
This is because:
• Packaged analytics don’t provide the flexibility and control over your
data in how it’s captured or structured.
• Privacy updates such as ITP mean that tracking with third-party
cookies is increasingly unreliable.
• Relying on packaged tools forces you to outsource your data
collection approach to a third party. For example, you don’t get
to decide what counts as a ‘conversion’ or ‘bounce rate’, the tool
decides it for you.
• Packaged tools are ‘black-boxes’ – it isn’t possible to see what
happens to your data under the hood.
• Third-party tools that model your data do not take your unique
business model or logic into account. Data is aggregated according
to a standard approach based around the ‘page view’, ‘session’ and ‘user’.
• Packaged tools don’t provide access to your raw data, limiting your
ability to leverage data beyond basic reporting.
We know that companies winning today are the ones who use behavioral
data to cultivate a strong understanding of their users and their needs. To get
there, modern organizations should look to move from ad-hoc data
functions, siloed off in their marketing, product and BI teams, to a centralized
strategic capability that can empower the whole business.
81
RETHINKING MODERN WEB ANALYTICS
While there is too much to be said on this subject to cover it sufficiently here,
the goal of the strategic data capability is to create a centralized, high-quality
data asset that can provide insights, power use cases and inform decisions
for all internal teams.
The first step for companies embarking on this path is to take full control of
their data. Built from the ground up with ownership and flexibility in mind,
Snowplow is a solution that can help data teams make this crucial step on
their data journey.
82
RETHINKING MODERN WEB ANALYTICS
There are multiple reasons why Snowplow is the solution of choice for
modern web analytics. The following examples are just the beginning.
It’s your choice how the data is used – for whatever use case or company goal you
are striving for. Snowplow data is flexible and does not prescribe a particular
approach or assumption on how your data should be utilized. You decide how
the data should be modeled, and ultimately used, to grow your business.
Snowplow data arrives clean, well structured and ready to use in your data
warehouse. All data collected by Snowplow is validated by JSON schemas,
set up according to the requirements of your unique tracking plan. The result
is that behavioral data delivered by Snowplow requires little cleaning or
reformatting before your data consumers can put it to work.
84
RETHINKING MODERN WEB ANALYTICS
And because Snowplow infrastructure is yours, you can configure your data
pipeline in a way that makes sense for your business, with no vendor lock-in
or preference for certain tools.
With total ownership of your data and freedom over your end-to-end
infrastructure, you can choose how you’d prefer to work with your web data asset.
“The gist is that once you have all the relevant data
for each event, which is possible with Snowplow,
you can do whatever you want with it. Snowplow’s
importance will only continue to grow as we
customize our pipeline.”
Rahul Jain, Principal Engineering Manager at Omio
85
CHAPTER 8
HOW WELCOME
TO THE JUNGLE
TOOK OWNERSHIP
OF THEIR WEB DATA
WITH SNOWPLOW
RETHINKING MODERN WEB ANALYTICS
87
RETHINKING MODERN WEB ANALYTICS
88
RETHINKING MODERN WEB ANALYTICS
Read the full story: Download the Welcome to the Jungle case study
89
RETHINKING MODERN WEB ANALYTICS
Aurélien Rayer, the company’s Head of Data, quickly found that the
company’s Google Analytics stack was not able to keep up with the growing
demands of a fast-growth company.
One major problem was lack of ownership. Without owning their raw data, it
was impossible to combine data sets or implement user stitching to build the
full picture of the customer journey across all platforms and channels. But
worse than that – the data didn’t add up. Some users were going missing
altogether, conversion rates didn’t look accurate, and it was clear that the
behavioral data provided by Google Analytics wasn’t telling the whole story.
While this was a challenge internally, it was also a cause of concern for
clients. Employers wanted to know the conversion rates of their ads, and
what they could do to improve them.
90
RETHINKING MODERN WEB ANALYTICS
With complete web data now seen as business-critical, Aurélien decided to move
to Snowplow for more reliable behavioral data capture. Welcome to the Jungle
could now deploy first-party, server-side tracking, finding that this enabled them
to overcome the restrictions of Safari’s ITP and other browser privacy measures.
With Snowplow, Welcome to the Jungle could track users without worrying
that third-party cookie data would be deleted, giving them a much more
complete view of user activity. Snowplow’s first-party tracking also meant
that Aurélien could track web visitors who used ad blockers (as long as they
granted their consent). These users could previously not be tracked with
Google Analytics, since ad blockers often automatically prevent tracking as a
side effect of blocking ads.
91
RETHINKING MODERN WEB ANALYTICS
Once the Snowplow tracking was in place, Aurélien was able to compare his
new data set with the data sets he was capturing from Google Analytics. The
results were astonishing. Welcome to the Jungle gained visibility into 3%
more unique users each month – which amounted to around 600,000 users
who were previously invisible. In addition, Aurélien was able to examine user
behavior more closely to discover that over 2 million of the company’s user
ids were bots. This meant the platform could waste less time displaying ads
to bot accounts and focus their targeting on real users.
The transition for Welcome to the Jungle from third-party tracking with Google
Analytics, to first-party tracking with Snowplow had a huge impact on their
business. Now the product team could deliver accurate insights with clients,
and take steps to help them improve their conversion rates where required.
But their web analytics transformation didn’t stop there. Welcome to the
Jungle also benefited from the flexibility Snowplow offers when it comes to
tracking key metrics. Using ‘pages pings’, a feature of Snowplow’s core web
tracker, Welcome to the Jungle gained a more accurate understanding of user
engagement with their media articles than an out-of-the-box tool could offer.
With Snowplow, Welcome to the Jungle were able to take ownership of their
behavioral web data, and leverage first-party tracking to get a complete
picture of their users and how they interact with their web content.
92
RETHINKING MODERN WEB ANALYTICS
Aurélien and his team were now equipped, not only to capture web data in a
way that made sense specifically for their organization, but to own their
behavioral data set. This can open up a host of possibilities for the future,
allowing them to explore such as building a view of the customer journey or
powering content recommendation systems.
93
snowplowanalytics.com