Understanding Data Quality
Asking the Right Questions
When you spot data problems or receive data you're unsure about, knowing what to ask is half the battle. These question templates help you communicate effectively with data teams and get the clarity you need.
The Five Essential Questions
1. "Where does this data come from?" (Source)
Why it matters: Data from different sources has different reliability levels.
What to ask:
- "Is this from our CRM, the accounting system, or a third-party source?"
- "Was this collected automatically or entered manually?"
- "How does this data get from the source to this report?"
What good answers look like:
- "This comes directly from Salesforce, updated every 4 hours"
- "Finance manually exports this from QuickBooks weekly"
Red flag answers:
- "I'm not sure where it comes from"
- "Someone just emails it to me"
2. "When was it last updated?" (Timeliness)
Why it matters: Yesterday's data might be perfect; last quarter's might be useless.
What to ask:
- "What's the 'as of' date for this data?"
- "How frequently is this refreshed?"
- "Is there a delay between when something happens and when it shows up here?"
What good answers look like:
- "This refreshes daily at 6 AM, so it's current as of yesterday"
- "There's a 24-hour delay for data processing"
Red flag answers:
- "I haven't checked in a while"
- "It should be recent" (without specifics)
3. "How complete is this dataset?" (Completeness)
Why it matters: Decisions based on 60% of your customers might not apply to the other 40%.
What to ask:
- "Are there any filters or exclusions I should know about?"
- "What percentage of records have complete information?"
- "Are there customer segments or regions not included?"
What good answers look like:
- "This includes all active customers but excludes churned accounts"
- "95% of records have complete contact information"
Red flag answers:
- "It's pretty complete, I think"
- "I'm not sure what got filtered out"
4. "Are there any known data quality issues?" (Accuracy)
Why it matters: Every dataset has quirks. Knowing them prevents misinterpretation.
What to ask:
- "Are there known issues with specific fields or segments?"
- "Has there been any data migration or system change recently?"
- "What should I be careful about when interpreting this?"
What good answers look like:
- "The phone numbers for customers before 2020 weren't validated, so about 10% are wrong"
- "We had a CRM migration last month; data from that period might be incomplete"
Red flag answers:
- "No issues" (suspiciously confident)
- "I haven't looked into it"
5. "What does this metric actually measure?" (Definition)
Why it matters: "Active users" can mean different things to different teams.
What to ask:
- "How exactly is this calculated?"
- "What's included and excluded from this number?"
- "Is this the same definition used in other reports?"
What good answers look like:
- "'Active' means logged in at least once in the last 30 days"
- "Revenue here is gross revenue before discounts, not net"
Red flag answers:
- "It's just the standard definition"
- Definitions that change when you dig deeper
Question Templates by Situation
When You Receive a New Report
1. "Can you walk me through where this data comes from and how it's refreshed?"
2. "Are there any known limitations or issues I should be aware of?"
3. "How do you define [key metric] in this context?"
When Numbers Don't Match Another Source
1. "I'm seeing different numbers in [other report]. Can you help me understand the discrepancy?"
2. "Are we using the same time period and filters?"
3. "Is there a difference in how we're calculating this?"
When You Spot a Potential Problem
1. "I noticed [specific issue]. Is this expected, or should I be concerned?"
2. "What's the impact if this data issue isn't addressed?"
3. "Is there a corrected version available, or a workaround?"
When Presenting Data to Others
1. "What caveats should I communicate about this data?"
2. "How confident are we in these numbers?"
3. "What questions should I be prepared to answer?"
The Data Request Template
When you need specific data, use this structure:
REQUEST: [What you need]
PURPOSE: [Why you need it / what decision it supports]
TIMEFRAME: [Date range needed]
FILTERS: [Any specific segments or conditions]
FORMAT: [How you need to receive it]
DEADLINE: [When you need it by]
Example:
REQUEST: Customer list with contact information
PURPOSE: Email campaign for Q1 product launch
TIMEFRAME: Active customers in the last 12 months
FILTERS: Exclude customers who opted out of marketing
FORMAT: Excel file with name, email, last purchase date
DEADLINE: January 15th
Building Good Relationships with Data Teams
Do:
- Ask questions before you encounter problems
- Provide context for why you need data
- Give feedback when data was helpful
- Report issues promptly and specifically
Don't:
- Wait until a presentation to discover problems
- Blame data teams for issues you didn't investigate
- Make assumptions about what data means
- Request urgent data without real urgency
Key Insight: The best data consumers are curious, not accusatory. Frame questions as "help me understand" rather than "why is this wrong."
Next: Learn to read and interpret dashboards and data visualizations like a pro. :::