Understanding Data Quality
Spotting Data Problems
You don't need to be a data analyst to spot data quality problems. With practice, you'll develop an instinct for recognizing when something doesn't look right.
The Five Most Common Data Problems
1. Missing Values
What it looks like:
- Blank cells in spreadsheets
- "N/A", "NULL", or "-" placeholders
- Fields showing "Unknown" or "Not Specified"
Business Impact:
- Incomplete customer profiles for marketing
- Missing contact info for sales follow-up
- Gaps in reporting and analytics
Quick Check: In any report, look for rows where key fields are empty. If more than 5-10% are missing, there's a problem.
2. Duplicate Records
What it looks like:
- Same person appears multiple times
- Identical transactions recorded twice
- Slightly different spellings of the same entity
Business Impact:
- Inflated customer counts and metrics
- Customers receiving duplicate communications
- Wasted resources on redundant outreach
Quick Check: Sort by name or email and scan for near-duplicates. Look for variations like:
- "John Smith" vs "JOHN SMITH" vs "Smith, John"
- "Acme Corp." vs "Acme Corporation" vs "ACME"
3. Outdated Information
What it looks like:
- Last update was months or years ago
- Addresses, phone numbers, or emails that no longer work
- Product prices or inventory that don't match current reality
Business Impact:
- Failed communications
- Decisions based on stale data
- Customer frustration
Quick Check: Look for "last updated" timestamps. If critical data hasn't been refreshed in the expected timeframe, flag it.
4. Inconsistent Formats
What it looks like:
- Dates in different formats (12/31/2025 vs 2025-12-31 vs "Dec 31")
- Phone numbers with varying formats (555-1234 vs (555) 123-4567)
- Currency without clear indicators ($1000 vs 1000 USD vs 1,000)
Business Impact:
- Errors when combining data from different sources
- Confusion in reporting
- Automated processes breaking
Quick Check: Scan a column for format variations. If you see more than one pattern, there's an inconsistency.
5. Obvious Errors
What it looks like:
- Negative values where only positive should exist (age = -5)
- Future dates for past events
- Values that are clearly impossible (salary = $1)
Business Impact:
- Skewed averages and totals
- Wrong business decisions
- Loss of trust in data
Quick Check: Look at minimum and maximum values. Do they make sense? A customer age of 150 or an order quantity of -10 signals a problem.
Your Data Problem Spotter Checklist
Use this when reviewing any dataset or report:
| Check | What to Look For | Action If Found |
|---|---|---|
| Missing Values | Blank cells, "N/A", placeholders | Ask: Should these be filled? |
| Duplicates | Repeated names, emails, or IDs | Ask: Are these truly different? |
| Staleness | Old timestamps, "last updated" dates | Ask: Is this current enough? |
| Format Issues | Mixed date/phone/currency formats | Ask: Can this cause errors? |
| Obvious Errors | Impossible values, negative where wrong | Ask: What went wrong? |
Real-World Example
Imagine you receive a customer report with 10,000 records. Here's what a quick scan might reveal:
| Issue Found | Count | Severity |
|---|---|---|
| Missing email addresses | 1,200 (12%) | High—can't reach these customers |
| Duplicate phone numbers | 89 pairs | Medium—possible duplicate customers |
| Last updated > 1 year | 3,400 (34%) | High—stale contact info |
| Invalid date of birth | 45 records | Low—edge case errors |
Your response: Before using this data, raise these issues with the data team and ask for cleanup or verification.
When to Escalate
Not all problems require immediate action. Use this guide:
| Severity | Criteria | Action |
|---|---|---|
| Critical | Affects >20% of data or key decisions | Stop and escalate immediately |
| High | Affects 5-20% or important segments | Flag before proceeding |
| Medium | Affects <5% or non-critical fields | Note and monitor |
| Low | Isolated edge cases | Document for future cleanup |
Key Insight: The goal isn't perfect data—it's data that's good enough for your specific purpose. A 95% complete dataset might be perfectly usable for trend analysis but inadequate for individual customer outreach.
Next: Learn the exact questions to ask data teams when you spot problems. :::