What Would The Data Look Like If The Model You Ruled Out Was What Was Really Happening

This is a four-part series

Click beneath to navigate to a previous chapter:

Chapter 1: The role of data modeling in the modernistic data stack
Affiliate two: Designing a information model that reflects your user journey
Chapter three: Building information models with Snowplow and dbt
Chapter four: How data modeling has evolved for modern information teams

Previously, nosotros looked at the all-time practices for modeling Snowplow data in dbt. In this post, we'll look more broadly at behavioral data modeling, and what opportunities lay ahead for organizations looking to drive value with behavioral data.

In the early on days of digital analytics, yous didn't model your information. Y'all didn't take to. Tools similar Google Analytics and later on, Adobe Analytics were really the just solutions for analytics available. Both of these products 'packaged up' the data journey, from drove, to transformation and reporting, all into a single solution.

At the fourth dimension when these products burst onto the scene, the computation required to transform information was expensive. This meant that GA and Adobe Analytics had to make hard decisions virtually how data models should be structured in an efficient and cost effective way. The traditional spider web data model was born.

The traditional information model

Equally nosotros touched on in chapter 2, the traditional spider web data model was essentially built up around the thought of a 'session'.

Sessions were a way to summarize a series of interactions, aggregating certain web metrics over time to paint a picture of what's happening. Definitions of sessions varied, but by and large the industry standardized on the idea of a period of activeness where no two actions were separated past more than thirty minutes.

This is the virtually mutual way of modeling behavioral data on web, but this approach comes with certain limitations.

First, in that location's a higher level of abstraction oft missing from this view – the user level. This gives united states of america useful information about users that businesses find especially useful. Different teams can utilize this data to answer specific questions, for example:

Product teams: "am I getting amend at retaining newer cohorts of users?
Marketing teams: are nosotros acquiring high value users?
Client teams: how effective are we at growing the value of our customers over time?

Then there are lower, deeper levels of assemblage missing, below the session level. The most common of these are funnels. While almost spider web analytics tools enabled users to create stock-still "funnels" and analyze how users progress through them, they didn't provide any mode for analysts to engage with shorter or longer periods than a session in their journeying.

Or to take a more real example of the limitations of sessionization, let's consider a non-traditional business model, say, a streaming service like Netflix. Within a platform similar Netflix, the concept of a session doesn't make much sense, nor would it help a data team better understand more granular beliefs, such equally how a certain content serial was consumed by audiences over several weeks, or months.

How do consumption patterns vary between different series? (Are some more bingable than others?)
How do consumption patterns very between different user types?
To what extent is more bingable content better at driving users to becoming a subscriber? (Or staying subscribed?)

These are kinds of questions that dive deeper into user behavior, which intermission the paradigms of traditional data models.

Digital analytics typically catered to standardized, transactional behaviors, such every bit retail or ecommerce purchases. Simply even in the retail industry, companies have evolved from these early ways of working with data. For example, many people use commitment services from grocery stores like Target, which follows a adequately typical item select – handbasket – guild business model. But what if y'all desire to dive deeper into how different customers were buying their items?

What if you lot wanted to know how long the user journey takes from filling a handbasket to making an order? For a family purchasing their weekly groceries, this might play out over several days.

Modeling data to answer a broader ready of questions

Every bit businesses evolve, the questions they want to ask nearly their users become more nuanced. Nosotros tin see that the traditional model, built around sessions, isn't suited to many businesses today. Particularly since many modernistic websites are more like 'spider web products' than the standard web structures we saw in the early on 2000s.

This development in data modeling was first recognized by Gary Angel, CTO of consulting firm Semphonic (later acquired by Ernst and Young), and a pioneer of avant-garde analytics. Angel predicted as early as 2008 that warehousing web data and stitching information technology with behavioral information from multiple sources would unlock a whole new level of opportunity in analytics.

"I think that probably the biggest direction in analytics is going to be increasing interest in information warehousing analytics information and joining it to customer information, really opening up that data from an Information technology perspective or platforms that are non just reporting platforms, give people a broad variety of analytic, modeling and statistical tools. The sophisticated organisations that we are working with that have sort of gone through the process of mastering web analytics, have stuck ane or ii years in getting a basic infrastructure in identify, getting skilful at web analytics reporting. Where they are going now is warehousing that data, joining it to their customer information, opening it upwards to a whole new ready of tools, a whole new fix of analysts, a whole new set of reports." – Interview with Gary Affections in 2008

Gary Angel was exactly right. When Amazon launched its cloud data warehouse, Redshift in 2012, it became possible (and cost constructive, even at small scale) for companies to process data with solutions similar Apache Hadoop, Hive and Spark, without relying on restrictive tools like Google Analytics. Angel foresaw that organizations would use these tools for advanced analytics to build a picture of their user journey, aggregating data in a fashion that made sense to them.

Once it was possible for companies to intermission out of the packaged analytics paradigm, it became important to brand the bound for two cardinal reasons:

Organizations didn't fit the traditional mould (i.e. standard, ecommerce-shaped client journeys), such as subscription businesses similar Gousto and 2-sided marketplaces like Unsplash;
Organizations wanted to go deeper with their information, ask more dynamic questions and evolve their analytics over time.

As businesses evolve, so must their analytics

Today, many companies come across their analytics as a manner to differentiate. They are exploring ways to practice more than advanced things with their data. Even industries that have been well-served by tools like Google Analytics are building their own data infrastructure or looking to the modern data stack to capture and model behavioral data. For example:

Media organizations are looking beyond standard sessionization to measure engagement with their content, with new means to calculate time on page or to attribute the success of their advertizing campaigns;
Retailers are experimenting with personalization and using behavioral data to measure buying propensity, and then they can trigger offers and promotions at optimal times.

These are just two examples, but the motility to advanced analytics has become more than mainstream in almost every sector. Companies now demand more flexibility over how they capture and model behavioral information, and want the ability to go beyond bones reporting to utilise cases that deliver real business concern value.

Analysts, for example, used to be limited to building business reports. The process of implementing a new reporting organization, from tracking, processing and visualizing the information might take anywhere from nine to 18 months to complete. For this reason, organizations would often be 'stuck' with their study, unable to adapt to changes in the business concern or product.

Only in today's competitive landscape, analytics has to proceed up with the pace of the business. Product development, to take one example, is only possible when supported by robust analytics that tin can evolve equally product features evolve, every bit the customer journeying changes, or as the business organisation grows. Every bit many startups know, this is not a 1-and-done process, only a bike of continuous comeback.

To achieve this, data teams are turning increasingly to a more flexible toolkit to assistance them to capture and model their data. Breaking out of the limitations of Google and Adobe Analytics, they're turning instead to a combination of solutions to encounter the needs of their data journeying – choosing from a marketplace of tools to build a 'modern data stack'.

Where Snowplow and dbt come in

Within this ecosystem, Snowplow and dbt stand out every bit highly flexible, open up-source solutions that can equip data teams to ask more questions of their behavioral data.

With Snowplow, you have a robust ready of tooling for capturing behavioral information, with total command and assurance that your data volition land in your data warehouse in a make clean, well-structured format. With dbt, Analytics Engineers can transform their data seamlessly, build a seamless workflow for information modeling, and evolve their models over time to come across the needs of their business organization.

Crucially, both Snowplow and dbt are built from the footing upward to support organizations evolve their behavioral information as they become more data sophisticated and outset using the data in new ways, and as their business and products and services modify. Traditional packaged analytics tools assumed that implementation was a 1-off process and that there was no need to evolve the data beyond that. Only modern organizations are constantly evolving the way they utilise their information.

Snowplow enables them to practise this with, for example, the ability to version definitions of events and entities, and roll out new versions in a controlled, stepwise fashion. Deject data warehouses offering the compute power required to recompute metrics on top of the entire data set (if those definitions change). Ameliorate yet, dbt offers structured workflows to enable organizations to evolve their data models incrementally as their behavioral information changes (the Snowplow input) and their apply cases multiply and evolve (the output).

In unison, Snowplow and dbt enable the data team to reach radically college levels of productivity, supporting organizations moving much much faster with their data than was possible with the last generation of tooling.

Where productivity is loftier, and data flows freely through the business, it's faster and easier for the arrangement to reply key questions. It'due south these information-nimble companies that are able to iterate quickly on their choices, making smart choices in product, marketing and other areas on the fly that brand all the difference for their users.

Build better information models with rich, behavioral data