Understanding Data Quality

Spotting Data Problems

3 min read

You don't need to be a data analyst to spot data quality problems. With practice, you'll develop an instinct for recognizing when something doesn't look right.

The Five Most Common Data Problems

1. Missing Values

What it looks like:

  • Blank cells in spreadsheets
  • "N/A", "NULL", or "-" placeholders
  • Fields showing "Unknown" or "Not Specified"

Business Impact:

  • Incomplete customer profiles for marketing
  • Missing contact info for sales follow-up
  • Gaps in reporting and analytics

Quick Check: In any report, look for rows where key fields are empty. If more than 5-10% are missing, there's a problem.

2. Duplicate Records

What it looks like:

  • Same person appears multiple times
  • Identical transactions recorded twice
  • Slightly different spellings of the same entity

Business Impact:

  • Inflated customer counts and metrics
  • Customers receiving duplicate communications
  • Wasted resources on redundant outreach

Quick Check: Sort by name or email and scan for near-duplicates. Look for variations like:

  • "John Smith" vs "JOHN SMITH" vs "Smith, John"
  • "Acme Corp." vs "Acme Corporation" vs "ACME"

3. Outdated Information

What it looks like:

  • Last update was months or years ago
  • Addresses, phone numbers, or emails that no longer work
  • Product prices or inventory that don't match current reality

Business Impact:

  • Failed communications
  • Decisions based on stale data
  • Customer frustration

Quick Check: Look for "last updated" timestamps. If critical data hasn't been refreshed in the expected timeframe, flag it.

4. Inconsistent Formats

What it looks like:

  • Dates in different formats (12/31/2025 vs 2025-12-31 vs "Dec 31")
  • Phone numbers with varying formats (555-1234 vs (555) 123-4567)
  • Currency without clear indicators ($1000 vs 1000 USD vs 1,000)

Business Impact:

  • Errors when combining data from different sources
  • Confusion in reporting
  • Automated processes breaking

Quick Check: Scan a column for format variations. If you see more than one pattern, there's an inconsistency.

5. Obvious Errors

What it looks like:

  • Negative values where only positive should exist (age = -5)
  • Future dates for past events
  • Values that are clearly impossible (salary = $1)

Business Impact:

  • Skewed averages and totals
  • Wrong business decisions
  • Loss of trust in data

Quick Check: Look at minimum and maximum values. Do they make sense? A customer age of 150 or an order quantity of -10 signals a problem.

Your Data Problem Spotter Checklist

Use this when reviewing any dataset or report:

CheckWhat to Look ForAction If Found
Missing ValuesBlank cells, "N/A", placeholdersAsk: Should these be filled?
DuplicatesRepeated names, emails, or IDsAsk: Are these truly different?
StalenessOld timestamps, "last updated" datesAsk: Is this current enough?
Format IssuesMixed date/phone/currency formatsAsk: Can this cause errors?
Obvious ErrorsImpossible values, negative where wrongAsk: What went wrong?

Real-World Example

Imagine you receive a customer report with 10,000 records. Here's what a quick scan might reveal:

Issue FoundCountSeverity
Missing email addresses1,200 (12%)High—can't reach these customers
Duplicate phone numbers89 pairsMedium—possible duplicate customers
Last updated > 1 year3,400 (34%)High—stale contact info
Invalid date of birth45 recordsLow—edge case errors

Your response: Before using this data, raise these issues with the data team and ask for cleanup or verification.

When to Escalate

Not all problems require immediate action. Use this guide:

SeverityCriteriaAction
CriticalAffects >20% of data or key decisionsStop and escalate immediately
HighAffects 5-20% or important segmentsFlag before proceeding
MediumAffects <5% or non-critical fieldsNote and monitor
LowIsolated edge casesDocument for future cleanup

Key Insight: The goal isn't perfect data—it's data that's good enough for your specific purpose. A 95% complete dataset might be perfectly usable for trend analysis but inadequate for individual customer outreach.

Next: Learn the exact questions to ask data teams when you spot problems. :::

Quick check: how does this lesson land for you?

Quiz

Module 2: Understanding Data Quality

Take Quiz
FREE WEEKLY NEWSLETTER

Stay on the Nerd Track

One email per week — courses, deep dives, tools, and AI experiments.

No spam. Unsubscribe anytime.