Understanding Data Quality

Spotting Data Problems

3 min read

You don't need to be a data analyst to spot data quality problems. With practice, you'll develop an instinct for recognizing when something doesn't look right.

The Five Most Common Data Problems

1. Missing Values

What it looks like:

  • Blank cells in spreadsheets
  • "N/A", "NULL", or "-" placeholders
  • Fields showing "Unknown" or "Not Specified"

Business Impact:

  • Incomplete customer profiles for marketing
  • Missing contact info for sales follow-up
  • Gaps in reporting and analytics

Quick Check: In any report, look for rows where key fields are empty. If more than 5-10% are missing, there's a problem.

2. Duplicate Records

What it looks like:

  • Same person appears multiple times
  • Identical transactions recorded twice
  • Slightly different spellings of the same entity

Business Impact:

  • Inflated customer counts and metrics
  • Customers receiving duplicate communications
  • Wasted resources on redundant outreach

Quick Check: Sort by name or email and scan for near-duplicates. Look for variations like:

  • "John Smith" vs "JOHN SMITH" vs "Smith, John"
  • "Acme Corp." vs "Acme Corporation" vs "ACME"

3. Outdated Information

What it looks like:

  • Last update was months or years ago
  • Addresses, phone numbers, or emails that no longer work
  • Product prices or inventory that don't match current reality

Business Impact:

  • Failed communications
  • Decisions based on stale data
  • Customer frustration

Quick Check: Look for "last updated" timestamps. If critical data hasn't been refreshed in the expected timeframe, flag it.

4. Inconsistent Formats

What it looks like:

  • Dates in different formats (12/31/2025 vs 2025-12-31 vs "Dec 31")
  • Phone numbers with varying formats (555-1234 vs (555) 123-4567)
  • Currency without clear indicators ($1000 vs 1000 USD vs 1,000)

Business Impact:

  • Errors when combining data from different sources
  • Confusion in reporting
  • Automated processes breaking

Quick Check: Scan a column for format variations. If you see more than one pattern, there's an inconsistency.

5. Obvious Errors

What it looks like:

  • Negative values where only positive should exist (age = -5)
  • Future dates for past events
  • Values that are clearly impossible (salary = $1)

Business Impact:

  • Skewed averages and totals
  • Wrong business decisions
  • Loss of trust in data

Quick Check: Look at minimum and maximum values. Do they make sense? A customer age of 150 or an order quantity of -10 signals a problem.

Your Data Problem Spotter Checklist

Use this when reviewing any dataset or report:

Check What to Look For Action If Found
Missing Values Blank cells, "N/A", placeholders Ask: Should these be filled?
Duplicates Repeated names, emails, or IDs Ask: Are these truly different?
Staleness Old timestamps, "last updated" dates Ask: Is this current enough?
Format Issues Mixed date/phone/currency formats Ask: Can this cause errors?
Obvious Errors Impossible values, negative where wrong Ask: What went wrong?

Real-World Example

Imagine you receive a customer report with 10,000 records. Here's what a quick scan might reveal:

Issue Found Count Severity
Missing email addresses 1,200 (12%) High—can't reach these customers
Duplicate phone numbers 89 pairs Medium—possible duplicate customers
Last updated > 1 year 3,400 (34%) High—stale contact info
Invalid date of birth 45 records Low—edge case errors

Your response: Before using this data, raise these issues with the data team and ask for cleanup or verification.

When to Escalate

Not all problems require immediate action. Use this guide:

Severity Criteria Action
Critical Affects >20% of data or key decisions Stop and escalate immediately
High Affects 5-20% or important segments Flag before proceeding
Medium Affects <5% or non-critical fields Note and monitor
Low Isolated edge cases Document for future cleanup

Key Insight: The goal isn't perfect data—it's data that's good enough for your specific purpose. A 95% complete dataset might be perfectly usable for trend analysis but inadequate for individual customer outreach.

Next: Learn the exact questions to ask data teams when you spot problems. :::

Quiz

Module 2 Quiz: Understanding Data Quality

Take Quiz