How should you invest your marketing budget? This is the holy grail of marketing questions, and it involves a lot of variables—which creative, which offer, to which audience, via which platforms (or media outlets, publishers, etc.). What combination of variables will give you the biggest bang for the buck?
But before you can answer that, you have to be able to understand what’s currently working and what isn’t across your marketing portfolio.
And before you can answer that, you need to reconstruct a picture of what you’ve done in the past.
It’s a hierarchy, each layer building on the previous one:
But before you can do any of that, you have to get your data together and into a state where it’s usable, accessible and trustworthy enough for analysis and insight. More and more people seem to understand that solving this “data problem” is key, and they also seem to understand that it’s hard. A widely-shared New York Times article from a couple years ago highlighted the challenge (italics mine):
“Data scientists, according to interviews and expert estimates, spend from 50 percent to 80 percent of their time mired in this more mundane labor of collecting and preparing unruly digital data, before it can be explored for useful nuggets.”
But what exactly is so hard about the data problem? What work is involved?
Attacking the Data Problem
There are a number of steps and challenges involved in collecting and preparing data—moving data from disparate places and noisy and inconsistent states into a clean, accessible, trustworthy and centralized state. It’s complicated and labor intensive, and requires a data analyst or data ops skill set, sometimes even a software engineering skill set, to do it well.
At a high level, the steps involved are:
1. Start with a plan
As with any complex task or process, your chance of succeeding is better if you know what you’re aiming for. Sometimes it’s impossible to hammer out the finer details before you know what’s in the source data, but you still need an upfront plan with clear goals and objectives, as well as a consensus around KPI definitions, reporting dimensions (geos, brands, etc.), data sources and owners.
2. Understand the source data
This is also referred to as “profiling” the data. It’s where you figure out how messy the source data is, how it is structured, the level of granularity, the metrics and dimensions it contains, etc. This is also where you figure out what data to extract from the source, how to extract it, and the ways in which you will need to transform the data to make it usable.
3. Collect the data
Some data lives in systems that have APIs (clearly defined mechanisms that different software can use to communicate with one another), and you can pull data out using a computer script or a specialized tool.
In other cases, the source data lives in spreadsheets or other files on a person’s computer, and that person can send the data somewhere (e.g., by email) or put it in an accessible location, like a secure FTP site or a Dropbox folder.
Source data can also live in databases or data warehouses and a variety of other places. Collecting the data means both figuring out how it will be delivered or transmitted to you, and figuring out what specific information to extract from the source (because often you don’t want everything).
4. Transform and enrich the data
This is where the real magic happens. Transformation can involve many different operations—cleansing, validation, integration, joining, sorting, aggregation, derivation and mapping, to name a few. Enrichment simply means adding information that is not in the source data, like adding city names when the data includes only postal codes.
Marketing is a domain where people are constantly innovating, and data sources and structures are forever changing, so data transformation solutions need ways to adapt to each change. But data also needs to be trustworthy. A data transformation error translates to analysis based on bad or inaccurate data. So, each change should be tested, governed and version-controlled according to software development lifecycle principles.
In short, the transformation step is where a lot of the heavy lifting happens for a source of truth—balancing agility and constant change with the need for accountability and accuracy.
5. Make the data available for analysis
This can simply mean outputting the data in .csv, Excel, Tableau format or another structure that a data analyst can open in their preferred analysis tool. But a scalable solution usually entails loading the data into a central repository alongside historical data and other useful context, ensuring that everyone who needs to access it is using the same structure.
6. Orchestrate and maintain the whole process
In a data-driven organization, this isn’t a one-off. The goal is to have data flowing into your decisions in a continuous way. For large enterprises and even medium-sized organizations, this whole process must be repeatable (ideally automated), scalable, secure and well-governed.
The bottom line
There are a number of approaches to “the data problem” that combine people, processes and technology in various ways. A viable solution generally has to account for all the steps described above—though there are differences in execution, like ETL (Extract, Transform, Load) vs. ELT (Extract, Load, Transform), which we’ll cover next.
The order of the steps isn’t fixed, and the detailed tasks and activities can differ, but the important thing to understand is that shortchanging any of the steps will make things slower, more costly and more error-prone.
It isn’t the sexiest stuff, but this blueprint for solving the data problem is what will ultimately determine whether your organization can share data for reporting and insight.