I keep hearing about the cost and risk of having dirty data in your MOPs systems, but I’m not quite sure how to check if my company’s data is up to snuff. Do you have any advice for how to root out dirty data and how to prevent it from happening?
This is a great question, Dave—and one we should all be talking about.
Let me start by making sure we’re on the same page around what dirty (or rogue) data is. The short version is that dirty data is any data that has erroneous information in it. The slightly longer version is that there are different types of dirty data, including duplicates, errors and typos, outdated information, prospects that don’t align with your target persona, and incomplete entries (e.g. without an email address).
For example, you could have an entry that’s from someone who no longer works at the company they did when they downloaded your ebook—so any email that goes to them would bounce back. Alternatively, you could have an entry that wasn’t typed in properly, and has “.con” instead of “.com” on their email address. Another thing that happens all the time is duplicate (or triplicate) entries. I’ve worked with companies that had thousands of duplicates in their database, and that’s not sustainable or practical.
You’re right that having dirty data can be costly. One issue is email reputation. If you’re sending marketing emails to people who shouldn’t be in your database, for instance, and they mark your email as spam, that counts as a ding against your email sender reputation. This is a measure that’s taken by internet service providers to determine whether they deliver your emails to the inboxes of the people on their network. The lower your score, the lower the chance that you reach your audience when you need to — and it can be really hard to recover from a low score.
Managing dirty data can also be expensive. Some martech databases will charge per the number of entries in your database. For the companies that have thousands of duplicates, that can mean that they’re spending way more than they should—which isn’t great. (The costs also add up when you have to spend on tools that clean that data.) Having that many duplicates also gives you a false understanding of how many people are actually in your audience, leading you to make decisions that don’t necessarily make sense for your business.
So, how do you stay ahead of dirty data? Here are my suggestions.
- Build habits into your processes
Every six to 12 months, you should be performing checks on your database to identify and remove any dirty data. Beyond just looking for duplicates and errors, this requires a concerted effort to identify the people that no longer fit within your target persona—as well as those that haven’t engaged with your content for a particular period of time.
- Take proactive steps
Dealing with dirty data shouldn’t just be a remedial approach; there are also things you can do to avoid creating those errors in the first place. When it comes to duplicates for instance, they tend to happen when teams import lists from multiple sources (e.g. Marketo and Salesforce) without checking for repeat entries. As such, if you’re importing data, make sure there’s a check in place to flag duplicates. You should also perform a cleanup of any list (e.g. to check if there’s a missing email address) before it gets migrated. Lastly, building a process to identify and delete common bogus email addresses (like email@example.com or firstname.lastname@example.org) can also help you keep your data clean.
- Normalize your data
You’ve probably seen that some companies and teams use full country names (like Canada), while others use country codes (like CA). To keep your data clean, the best thing you can do is normalize your entries so that there aren’t any discrepancies in your data set.
These might sound like small changes, but they’re important ones. Trust me, once you start doing these things, you’ll be able to have a lot more trust in your data.
You’ve got this,