Understanding Data Quality
The Six Data Quality Dimensions
Not all data is created equal. The DAMA (Data Management Association) framework defines six dimensions that determine whether data is fit for use. Understanding these dimensions helps you spot problems before they lead to bad decisions.
The DAMA Framework
| Dimension | Question It Answers |
|---|---|
| Accuracy | Does this data correctly represent reality? |
| Completeness | Are all required values present? |
| Consistency | Does the data match across all systems? |
| Timeliness | Is the data current enough for my needs? |
| Uniqueness | Are there any duplicate records? |
| Validity | Does the data conform to expected formats and rules? |
Dimension 1: Accuracy
Definition: Data correctly represents the real-world entity or event it describes.
Business Example:
- ✅ Customer's address matches their actual location
- ❌ Customer's email bounces because it was entered incorrectly
Red Flags:
- Returned mail or bounced emails
- Customers complaining about wrong information
- Numbers that don't match source documents
How to Check: Verify a sample of records against source documents or real-world confirmation.
Dimension 2: Completeness
Definition: All required data is present—no missing values in critical fields.
Business Example:
- ✅ Every order has shipping address, contact info, and payment details
- ❌ 20% of customer records are missing phone numbers
Red Flags:
- Blank fields in required columns
- "N/A" or placeholder text in important fields
- Reports showing "Unknown" categories
Simple Calculation:
Completeness = (Records with all required fields / Total records) × 100%
Example: 950 complete out of 1,000 = 95% completeness
Dimension 3: Consistency
Definition: The same data appears the same way across all systems.
Business Example:
- ✅ Customer name in CRM matches name in billing system
- ❌ Sales says "Acme Corp" but Finance says "ACME Corporation"
Red Flags:
- Different totals in different reports for the same metric
- Name/address variations across systems
- Conflicting information when joining datasets
Dimension 4: Timeliness
Definition: Data is current enough to be useful for its intended purpose.
Business Example:
- ✅ Inventory count updated today for fulfillment decisions
- ❌ Using last month's customer list for a time-sensitive campaign
Red Flags:
- Old timestamps on critical records
- Decisions based on "when we last checked"
- Stale reports being treated as current
Key Question: How fresh does this data need to be for my decision?
Dimension 5: Uniqueness
Definition: Each real-world entity is represented only once—no duplicate records.
Business Example:
- ✅ One customer record per customer
- ❌ "John Smith" appears three times with different contact info
Red Flags:
- Customer receiving multiple copies of the same email
- Totals that seem higher than expected
- Conflicting information for the same entity
Dimension 6: Validity
Definition: Data conforms to defined business rules and formats.
Business Example:
- ✅ Email addresses contain "@" and a domain
- ❌ Age field contains "thirty-two" instead of "32"
Red Flags:
- Invalid formats (phone numbers with letters)
- Values outside acceptable ranges (age = -5)
- Dates in inconsistent formats (12/01/2025 vs 2025-01-12)
Quick Reference Card
| Dimension | Check By Looking For |
|---|---|
| Accuracy | Verification against source |
| Completeness | Missing values, blank fields |
| Consistency | Mismatches across systems |
| Timeliness | Stale timestamps, old data |
| Uniqueness | Duplicate records |
| Validity | Format errors, rule violations |
Remember: Poor data quality has a real cost. Companies lose an average of 43 hours per employee yearly due to data issues.
Next: Learn to spot common data problems that indicate quality issues. :::