Understanding Data Quality

The Six Data Quality Dimensions

3 min read

Not all data is created equal. The DAMA (Data Management Association) framework defines six dimensions that determine whether data is fit for use. Understanding these dimensions helps you spot problems before they lead to bad decisions.

The DAMA Framework

Dimension Question It Answers
Accuracy Does this data correctly represent reality?
Completeness Are all required values present?
Consistency Does the data match across all systems?
Timeliness Is the data current enough for my needs?
Uniqueness Are there any duplicate records?
Validity Does the data conform to expected formats and rules?

Dimension 1: Accuracy

Definition: Data correctly represents the real-world entity or event it describes.

Business Example:

  • ✅ Customer's address matches their actual location
  • ❌ Customer's email bounces because it was entered incorrectly

Red Flags:

  • Returned mail or bounced emails
  • Customers complaining about wrong information
  • Numbers that don't match source documents

How to Check: Verify a sample of records against source documents or real-world confirmation.

Dimension 2: Completeness

Definition: All required data is present—no missing values in critical fields.

Business Example:

  • ✅ Every order has shipping address, contact info, and payment details
  • ❌ 20% of customer records are missing phone numbers

Red Flags:

  • Blank fields in required columns
  • "N/A" or placeholder text in important fields
  • Reports showing "Unknown" categories

Simple Calculation:

Completeness = (Records with all required fields / Total records) × 100%
Example: 950 complete out of 1,000 = 95% completeness

Dimension 3: Consistency

Definition: The same data appears the same way across all systems.

Business Example:

  • ✅ Customer name in CRM matches name in billing system
  • ❌ Sales says "Acme Corp" but Finance says "ACME Corporation"

Red Flags:

  • Different totals in different reports for the same metric
  • Name/address variations across systems
  • Conflicting information when joining datasets

Dimension 4: Timeliness

Definition: Data is current enough to be useful for its intended purpose.

Business Example:

  • ✅ Inventory count updated today for fulfillment decisions
  • ❌ Using last month's customer list for a time-sensitive campaign

Red Flags:

  • Old timestamps on critical records
  • Decisions based on "when we last checked"
  • Stale reports being treated as current

Key Question: How fresh does this data need to be for my decision?

Dimension 5: Uniqueness

Definition: Each real-world entity is represented only once—no duplicate records.

Business Example:

  • ✅ One customer record per customer
  • ❌ "John Smith" appears three times with different contact info

Red Flags:

  • Customer receiving multiple copies of the same email
  • Totals that seem higher than expected
  • Conflicting information for the same entity

Dimension 6: Validity

Definition: Data conforms to defined business rules and formats.

Business Example:

  • ✅ Email addresses contain "@" and a domain
  • ❌ Age field contains "thirty-two" instead of "32"

Red Flags:

  • Invalid formats (phone numbers with letters)
  • Values outside acceptable ranges (age = -5)
  • Dates in inconsistent formats (12/01/2025 vs 2025-01-12)

Quick Reference Card

Dimension Check By Looking For
Accuracy Verification against source
Completeness Missing values, blank fields
Consistency Mismatches across systems
Timeliness Stale timestamps, old data
Uniqueness Duplicate records
Validity Format errors, rule violations

Remember: Poor data quality has a real cost. Companies lose an average of 43 hours per employee yearly due to data issues.

Next: Learn to spot common data problems that indicate quality issues. :::

Quiz

Module 2 Quiz: Understanding Data Quality

Take Quiz