Building AI into Your Product Strategy
Writing AI Product Requirements
Traditional PRDs don't work for AI features. AI is probabilistic, not deterministic—you need different requirements.
Why AI PRDs Are Different
| Traditional PRD | AI PRD |
|---|---|
| "Show user's name" | "Predict user preference with 85%+ accuracy" |
| "Button click → action" | "Input → model → probabilistic output" |
| "Same input = same output" | "Same input can = different outputs" |
| "Test once, ship" | "Monitor continuously, retrain" |
The AI PRD Template
Section 1: Problem Statement
What you're solving:
We're building [AI capability] to solve [user problem]
because [evidence of problem] and current solutions
[fail because X].
Example:
We're building automated content moderation to solve the problem of scaling our review team. Currently, moderators review 5,000 posts/day manually, creating a 6-hour backlog. 80% of flagged content is obvious violations that don't need human judgment.
Section 2: Success Metrics
Define measurable thresholds:
| Metric | Target | Rationale |
|---|---|---|
| Accuracy | ≥90% | Industry benchmark for content moderation |
| Precision | ≥95% | False positives damage user trust |
| Recall | ≥85% | Some violations can slip through |
| Latency | <500ms | Real-time user experience |
| Cost | <$0.01/prediction | Stay within budget |
Critical: Define what "correct" means
For classification tasks:
- What are all possible categories?
- How do you handle edge cases?
- What's the "gold standard" for comparison?
Section 3: Input/Output Specification
Input:
What: User-generated text posts
Format: UTF-8 strings, 1-5000 characters
Languages: English, Spanish
Volume: 50,000 posts/day
Source: Posts API endpoint
Output:
What: Moderation decision
Format:
{
"decision": "approve" | "flag" | "reject",
"confidence": 0.0-1.0,
"reason_code": "spam" | "hate" | "violence" | ...,
"requires_human_review": boolean
}
Section 4: Edge Cases & Error Handling
Define how to handle:
| Scenario | Expected Behavior |
|---|---|
| Model confidence <70% | Route to human review |
| Input too long | Truncate + process first 5000 chars |
| Unsupported language | Default to human review |
| Model timeout | Retry once, then queue for async |
| Model returns error | Log, alert, route to human review |
Section 5: Data Requirements
Training data:
- Minimum volume needed
- Data sources
- Labeling requirements
- Refresh frequency
Example:
Need 50,000 labeled examples (approved/flagged/rejected) from the past 12 months. Labels must match current policy. Quarterly retraining with new policy updates.
Section 6: Human-in-the-Loop Design
Where do humans stay involved?
| Touchpoint | Trigger | Action |
|---|---|---|
| Low confidence | <70% confidence | Human reviews decision |
| Appeals | User disputes | Human re-reviews |
| Audit | Random 5% sample | Quality check |
| Drift detection | Weekly accuracy drop >2% | Alert ML team |
Section 7: Rollout Plan
AI features need gradual rollout:
| Phase | Traffic | Duration | Gate to Next |
|---|---|---|---|
| Shadow | 0% (logging only) | 2 weeks | Accuracy ≥85% |
| Canary | 5% | 1 week | No major issues |
| Beta | 25% | 2 weeks | Accuracy ≥88% |
| GA | 100% | Ongoing | Continuous monitoring |
Section 8: Monitoring & Alerts
What to track post-launch:
| Metric | Alert Threshold | Escalation |
|---|---|---|
| Accuracy | <85% | Page on-call |
| Latency p99 | >1000ms | Page on-call |
| Error rate | >1% | Slack alert |
| User appeals | >5% increase | Weekly review |
Common PRD Mistakes
1. Treating AI like traditional software
- Bad: "The model will correctly classify all spam"
- Good: "The model will classify spam with ≥92% precision"
2. Ignoring edge cases
- Bad: No mention of what happens when confidence is low
- Good: Explicit fallback to human review below 70% confidence
3. No baseline comparison
- Bad: "We want high accuracy"
- Good: "We want 90% accuracy vs. current 75% rule-based system"
4. Missing monitoring requirements
- Bad: PRD ends at launch
- Good: PRD includes ongoing metrics, alerts, and retraining triggers
Template Checklist
Before finalizing your AI PRD:
- Problem is clearly defined with evidence
- Success metrics have specific numbers
- Input/output formats are specified
- Edge cases are documented
- Human-in-the-loop touchpoints defined
- Rollout plan has gates between phases
- Monitoring and alerting requirements included
- Baseline comparison established
Key Takeaway
AI PRDs must acknowledge uncertainty. Define success with ranges and thresholds, not absolutes. Plan for errors, monitoring, and continuous improvement.
Next: Should you build AI in-house or buy from vendors? Let's explore the decision framework. :::