How Can I Avoid Having Dirty Data?

Hi Jo,

I keep hearing about the cost and risk of having dirty data in your MOPs systems, but I don’t know how to check if my company’s data is up to snuff.

Do you have any advice on rooting out dirty data and preventing it from happening?

Thanks,

Data-driven Dave

pink seperator line

This is a great question, Dave!

It’s one we should all be talking about.

Let me start by making sure we’re on the same page about dirty data (sometimes called rogue data).

The short version is that dirty data is any data with erroneous information.

The slightly longer version is that there are different types of dirty data, including:

duplicates
errors and typos
outdated information
prospects that don’t align with your target persona, and
incomplete entries (e.g., without an email address).

Examples of dirty data:

For example, you could have an entry from someone who no longer works at a company — so any email sent to them will bounce back.

Or you could have an entry with a typo, like an email address ending in “.con” instead of “.com.”

Duplicate (or triplicate) entries are another common data problem. I’ve worked with companies with thousands of duplicates in their database, which is not sustainable or practical.

Dirty data is a mess

Dirty data can indeed be costly.

Email reputation

Bad email reputation is a huge issue.

For instance, if you’re sending marketing emails to people who shouldn’t be in your database and they mark your email as spam, that counts as a ding against your email sender reputation.

Your sender reputation is a measure that internet service providers take to determine whether they will deliver your emails to the inboxes of the people on their network.

The lower your score, the lower the chance your email reaches your audience. It can be really hard to recover from a low score.

Database costs

Some martech databases will charge per the number of entries in your database.

For companies with thousands of duplicates, that can mean they’re spending way more than they should—which isn’t great.

The costs also add up when you have to spend on tools that clean that data.

Having that many duplicates also gives you a false understanding of how many people are actually in your audience. It can lead you to make decisions that don’t necessarily make sense for your business.

Steps to avoid dirty data

So, how do you stay ahead of dirty data?

You can do it in-house, but it does require some heavy lifting (which we can help with).

Here are my suggestions.

Create a data hygiene plan

Bloated, inaccurate databases cause all kinds of problems.

Data hygiene is a company-wide project that gets your entire team on the same page.

It will standardize how people collect and handle data across systems and conduct periodic audits to check the quality of your data and sources.

We wrote a Tough Talks Made Easy article outlining the steps you’ll want to take.

Build habits into your processes

Every six to 12 months, you should perform checks on your database to identify and remove any dirty data.

This process goes beyond just looking for duplicates and errors.

It requires a concerted effort to identify the people who:

no longer fit within your target persona, or
haven’t engaged with your content for a particular period.

Take proactive steps

Dealing with dirty data shouldn’t just be a corrective action.

There are also things you can do to avoid creating those errors in the first place.

For example, duplicate entries tend to happen when teams import lists from multiple sources (e.g., Marketo and Salesforce) without checking for repeat entries.

If you’re importing data, ensure there’s a check in place to flag duplicates.

You should also clean up any list (e.g., check if there’s a missing email address) before it gets migrated.

Lastly, building a process to identify and delete common bogus email addresses (like [email protected] or [email protected]) can help keep your data clean.

Normalize your data

You’ve probably seen that some companies and teams use full country names, like Canada, while others use country codes, like CA.

The best way to keep your data clean is to normalize your entries so there aren’t discrepancies in your data set.

These might sound like small changes, but they’re important ones. Trust me, once you start doing these things, you’ll be able to have a lot more trust in your data.

You’ve got this, and if you need more advice, let us know.

Jo Pulse

P.S. Never miss an update! Follow us on LinkedIn

How Can I Avoid Having Dirty Data?

Jessica Walker