Creating and Using a Content Segmentation Model for Web Analytics

By John Hughes

11 September 2025

Part 2 of 5

A Deep Dive into Content Segmentation for Enhanced Insights

Introduction

As the second part of this blog series about measuring content, we take a deep dive into content segmentation models for web analytics. Where we reference analytics technical techniques, we will stick to how Google Analytics 4 and Google Tag Manager operate, but the principles described here are universal in terms of the relationship between content and web analytics.

If you want to find out more about how Storm can help you audit, analyse or measure your digital service content, get in touch with our team!

"A content segmentation model serves as a blueprint for organising your content, ensuring consistency, and enabling more efficient data management and analysis."

12 to 14 min read

Understanding the content segmentation model

A content segmentation model is a structured framework that defines the types of content you have, the relationships between them, and the attributes each type of content possesses. It serves as a blueprint for organising your content, ensuring consistency, and enabling more efficient data management and analysis.

Why a content segmentation model is essential

The primary purpose of such a segmentation model is to provide a clear and consistent structure for your content. This structure facilitates better data collection, storage, and analysis, ultimately leading to more informed decision-making. Here are some key benefits of a well-defined segmentation model:

Consistency
A content segmentation model ensures that all content is organised and labelled uniformly, making it easier to manage and analyse.
Data quality
By having clear definitions for the attributes and relationships between different content types and parameters, a content segmentation model helps maintain high data quality.
Efficiency
A well-structured content segmentation model helps to streamline the process of content creation, management, and analysis.
Scalability
A content segmentation model provides a scalable framework that can grow with your organisation, accommodating new content types and attributes as needed.

Types of dimensions in a segmentation model

Before you create a content segmentation model for your website or digital service, it is important to consider the scope of the dimensions you might include in the model. Typically, dimensions come in three types:

Single aggregators

These dimensions always have just one value per web page, but there might be multiple pages.

Some examples that might be relevant single aggregators for content might include:

Page topic
For example the page might be about the topic “Parking fines” it may not be the only page where this is the topic, else the topic is likely to be too granular.
Category
For example, it might be categorised as “Roads and traffic”. Typically, we would expect categories to be less granular than topics, and although there is no hierarchy required between category and topic, sometimes one is implied by your content organisation.
Page purpose
For example, ‘guidance’. The types should be values that indicate the purpose of the content, such as guidance, transactional, signposting, etc., depending on the type of service you are providing.
Content type
The technical content type or page template. Tracking this may help you optimise page templates by understanding their relative performance better.
Word count
Clearly a specific word count might be quite a granular dimension, but it may prove useful in data analysis, and it is possible many pages by chance have the same word count. However, you may prefer to record work count thresholds such as “Zero to 100 words”, “101 to 300 words”, “301 to 1000 words” and so on. Choose thresholds that helpfully divide your content, otherwise it may all end up clustered in one group!
Organisation department
This means the responsible owner of the content, i.e. who is accountable for its performance.
Primary desired outcome
As the flipside to user need, what is the one thing you as an organisation want users to do on this page above all else, for example, submit a form, click a download link, or follow a specific signpost link.
Content age or publish date
It may be useful for you to understand the content lifecycle. For example, does content engagement decline over time, or conversely does freshening up the content too frequently perhaps confuse the audience? This dimension could be a granular date or age, or you might categorise it into lifecycle stages such as “Just published”, “Over 1 year old” and others.
Sentiment
Is the page conveying positive information or negative information? Alternatively, is it primarily expressing a process instruction, a legal command, requesting user information, or just providing information to the user?
Text reading age
Tracking the text reading age of content is a great way to monitor how content engagement and performance changes with text complexity.
Reading time
By inferring a reading time for each page, either through direct measurement, or inference from the word count, it is possible to measure how the reading time might have a relationship with page performance.
Number of images on a page
By knowing the number of images on a page, you would be able to draw conclusions from data about what a good number of images is for content, segmented by other model dimensions. You may also consider the content relevance of images, although this is a less direct thing to measure.
Number of headings on a page
Like images, the number of headings on a page may help you understand the impact of headings versus overall word count, and how this impacts performance.
Position in site IA
This dimension could be an indicator of the number of clicks from the home page required for a user to find this content through normal site navigation.
Folder depth
Similarly to the position in the IA, but how many folders deep in the URL is the page.
Page load weight
By page load weight, we mean the data transferred to the browser to load the web page. What we are really concerned with here is on the environmental impact of your web page, in other words its carbon footprint. Although it can vary depending on very many factors such as your web hosting choices, a user’s screen brightness and so on, the data transfer of the web page to users’ browsers can account for around 40% of your CO2 footprint. How much can you improve the CO2 footprint of your digital service before its performance suffers? Most carbon footprint measurements focus on measuring your homepage, but it is much more effective, and a better balance against service performance to use a granular measure such as this to optimise individual pages.
Page complexity
Based potentially on a combination of the above measures, an indication of the overall complexity of the page; for example, based on word count, number of headings, number of images and other measures. What counts as complex should be relative to the other content on the digital service to make this actionable – it would not be useful to describe all content on a service as very complex, for example, as this doesn’t enable you to compare content across the service. What counts as complex is likely to differ from organisation to organisation.

Multiple aggregators

These dimensions may have multiple values per page, and multiple pages per value.

Some examples that might be relevant multiple aggregators for content might include:

Author / Contributors
Content will usually have only one author, so you might think of it as a single aggregator, but it is possible that multiple people contribute or write content. If tracking the authors or contributors is important to your content segmentation model, the dimension should be treated as a multiple aggregator for this reason.
Tags or navigation facets
Tagging is often applied to blog posts and similar content, often to facilitate navigating content based on a broader content dimensional model. For example, tourism content maybe be tagged with locations, activity types, suitable audiences (e.g. for families) etc.
Target audience
Recording who the target audiences are for each piece of content is useful as it allows aggregation of metrics matching who the content was intended for. This can be matched against aggregation of demographic model dimensions to build inferences about how well the content is reaching target audiences. Note that you will not be able to record granular demographic data about individuals reading you content, and so audience matching is the next best alternative.
SEO search queries
Understanding the search queries that drive users to individual content pages is helpful in both evaluating the SEO performance of the page and in explaining anomalies in on-page performance. Like matching target audiences, this information can be used to optimise performance by prioritising content that has clear actionable improvements to make in SEO.

One-to-one values

These dimensions have one value per page, and one page per value.

Some examples that might be relevant one-to-one values for content might include:

Page URL
Page URL is a unique dimension in terms of segmentation models in that it is the only dimension that is almost certain to remain unchanged through the lifetime of the model, excepting for website redevelopment. Consequently, Page URL will be treated as a primary key, in your model. You can think of your content segmentation model as a big data table with one row for each Page URL, and multiple columns, one for each dimension in your model.
Main heading
Tracking the Main page heading can help understanding of how changes to the page heading might impact performance. However, the web pages can only have one main heading at a time.
Page Title
Like main heading, recording the Page Title can help understand how to optimise the Page Title for improved performance, particularly in reference to SEO dimensions such as search query performance.

Keeping the dimension model clean

To maintain the cleanliness and effectiveness of your segmentation model, it is crucial to avoid duplication and ensure that each dimension holds a single, unambiguous value.

Duplication, where the same data is recorded in multiple dimensions, can lead to data bloat, which may result in inconsistencies and inaccuracies in your analysis.

Furthermore, having different formats for values for the same dimension can cause analysis to break by, for example having values like “Guidance” and “guide” both in the content type dimension. Ensure such categorical data is operated from controlled lists or cleaned to match controlled lists.

Data sources and storage for the content segmentation model

Once you have defined the dimensions for your content segmentation model, the next step is to ensure that you have set up all the appropriate data sources. Not all the data is necessarily collected in one place. Therefore, describing the data sources for the model should form part of your overall measurement planning documentation – here at Storm we use a document type called a Site Tracking Audit Guide (STAG) to store such information.

Google Analytics 4 dimensions

Inevitably, much of the data you collect will be in your web analytics package collected as part of your analytics tracking. Such dimensions include page URL, and page Title are tracked as standard. You may also be tracking content groupings.

Sending to GA4 via the dataLayer and Google Tag Manager

If you can access them from the CMS or elsewhere, you can track other content data using the dataLayer to pass additional parameters with page view events. The dataLayer allows you to pass variables to Google Tag Manager which can be sent with events that you track.

For example, you might send information about page topic, content type and publish date. The dataLayer push might look something like this:

dataLayer.push({
'event': 'pageview',
'pageTopic': 'Example Topic',
'contentType': 'Article',
'publishDate': '2024-10-18'
});

Sending to GA4 via DOM elements and Google Tag Manager

You may have some content dimensions for your segmentation model which are consistently displayed on the page itself, such as tags and the main heading. If these are clearly and consistently formatted in HTML and CSS, you can likely collect these through a DOM Element variable in Google Tag Manager.

The DOM Element variable allows you to select all the content of an HTML tag based on a CSS ID or CSS Selector, making it simple to collect content elements displayed on screen for the model.

Collecting information via a crawl

Some of the information you want in your content segmentation model might be able to be collected via a crawl using Screaming Frog or other similar tools. We will describe Screaming Frog’s functionality here for the purposes of example.

Screaming Frog is a tool that executes a crawl and scrape of your website to collect specific information. It is design primarily to assist in search engine optimisation auditing by collecting data such as page metadata quickly and efficiently.

It will automatically collect information such as page URL, page title, main heading, file size, word count, Flesch Kincaid score, and other useful information which may feed into your content data model.

You can also us it to extract information from web pages, and it can be an alternative to using DOM Elements in Google Tag Manager. Bear in mind, tough, that a crawl is a snapshot of data at a point in time, and it is not as useful for data points that change frequently. Screaming Frog exports data to CSV, which is useful as its own data storage type (as we will see shortly). You may prefer to edit and clean the CSV exports first, though to remove data columns you don’t need, as Screaming Frog collects and subsequently exports a lot of data!

Collecting information from Google Search Console

Google Search Console is a useful data source for pulling data about search queries that users might use to access your digital service. Additionally, it carries information about the crawl success of Google for your web pages, helping you to reduce errors, and improve performance in Googe Search. Bing Webmaster Tools is a similar too for Bing Search data.

Storing data in CSV, Excel or databases

Some of the data you might want in your content data model might not be easily collectable through Google Analytics, a crawl, or Google Search Console. In this case, it needs to be stored in some other method. As indicated earlier, you can think of the data as being a big flat table of data with a row for each page and a column for each dimension. In this sense, any other content dimensions that you need to store can be stored in a single database table, CSV or Excel worksheet in this manner.

That said, you might optionally store multiple aggregators in their own dimension tables with a row for each relationship of page to value. The alternative would be to store multiple values in a comma separated list or using JSON within the flat table. Which method you choose would normally be impacted by the toolkit you will choose to analyse the data with later, although it is not difficult to transform data from one format to the other.

You might want to do this for data fields that are difficult to collect elsewhere, but that change infrequently, such as target audience, organisation department, and sentiment.

Data analysis using your content segmentation model

To analyse content performance, you would bring your disparate data sources together in a data model in Looker Studio, Power BI or a similar data visualisation tool, or apply data science techniques to the data using Python or R.

Segmentation

The model enables segmentation of your data based on the dimensions in the model. Segmentation allows you to compare different segments and identify trends, patterns, and areas for improvement.

Contextual Analysis

Adding context to your data can provide deeper insights and help you understand the factors influencing content performance. For example, you can analyse the impact of content changes such as rewritten content or a change in a page heading, or even improvement in SEO performance and how these things impact on-page performance.

Optimisation

The content segmentation model helps you isolate opportunities for optimisation, such as improving reading age, content length, CO2 impact or targeting specific audience segments. Recording changes in your model can also help you measure the impact of these optimisations over time.

Data Science Techniques

A well-structured content segmentation model provides richer data for applying data science techniques, such as machine learning, natural language processing, or sentiment analysis. These techniques can help uncover hidden patterns, predict future trends, and enhance your understanding of user behaviour.

The questions you can ask data in a content segmentation model like this include things like:

What reading age limits see a drop in on-page performance?
How does word count impact page performance?
Where can I draw the line between reducing file size (CO2 footprint) and maintaining service success?
How does content performance vary across page purposes?
How does page complexity impact page performance?
How does page complexity impact SEO?

Although the list of questions you could ask is almost infinite, you should focus on questions and hypotheses that are likely to be both actionable and impactful.

Tracking Changes

Keeping your content segmentation model up to date is essential for maintaining data quality. Some information can be gathered using automated tools, such as web crawlers like Screaming Frog, which can scan your website and extract relevant data. However, other information may need to be updated manually. Establishing a process for tracking changes, such as regular audits or automated alerts, can help ensure your model remains accurate.

Summary

Creating and using a content segmentation model for web analytics is a powerful way to organise, manage, and analyse your content data. By defining key dimensions, documenting the data sources, and applying the model in data analysis, you can unlock valuable insights and drive more informed decision-making. Whether you are a content manager, data analyst, or digital marketer, a well-defined model is an essential tool for maximising the value of your web content.