Writing AI Product Requirements

Traditional PRDs don't work for AI features. AI is probabilistic, not deterministic—you need different requirements.

Why AI PRDs Are Different

Traditional PRD	AI PRD
"Show user's name"	"Predict user preference with 85%+ accuracy"
"Button click → action"	"Input → model → probabilistic output"
"Same input = same output"	"Same input can = different outputs"
"Test once, ship"	"Monitor continuously, retrain"

The AI PRD Template

Section 1: Problem Statement

What you're solving:

We're building [AI capability] to solve [user problem]
because [evidence of problem] and current solutions
[fail because X].

Example:

We're building automated content moderation to solve the problem of scaling our review team. Currently, moderators review 5,000 posts/day manually, creating a 6-hour backlog. 80% of flagged content is obvious violations that don't need human judgment.

Section 2: Success Metrics

Define measurable thresholds:

Metric	Target	Rationale
Accuracy	≥90%	Industry benchmark for content moderation
Precision	≥95%	False positives damage user trust
Recall	≥85%	Some violations can slip through
Latency	<500ms	Real-time user experience
Cost	<$0.01/prediction	Stay within budget

Critical: Define what "correct" means

For classification tasks:

What are all possible categories?
How do you handle edge cases?
What's the "gold standard" for comparison?

Section 3: Input/Output Specification

Input:

What: User-generated text posts
Format: UTF-8 strings, 1-5000 characters
Languages: English, Spanish
Volume: 50,000 posts/day
Source: Posts API endpoint

Output:

What: Moderation decision
Format:
{
  "decision": "approve" | "flag" | "reject",
  "confidence": 0.0-1.0,
  "reason_code": "spam" | "hate" | "violence" | ...,
  "requires_human_review": boolean
}

Section 4: Edge Cases & Error Handling

Define how to handle:

Scenario	Expected Behavior
Model confidence <70%	Route to human review
Input too long	Truncate + process first 5000 chars
Unsupported language	Default to human review
Model timeout	Retry once, then queue for async
Model returns error	Log, alert, route to human review

Section 5: Data Requirements

Training data:

Minimum volume needed
Data sources
Labeling requirements
Refresh frequency

Example:

Need 50,000 labeled examples (approved/flagged/rejected) from the past 12 months. Labels must match current policy. Quarterly retraining with new policy updates.

Section 6: Human-in-the-Loop Design

Where do humans stay involved?

Touchpoint	Trigger	Action
Low confidence	<70% confidence	Human reviews decision
Appeals	User disputes	Human re-reviews
Audit	Random 5% sample	Quality check
Drift detection	Weekly accuracy drop >2%	Alert ML team

Section 7: Rollout Plan

AI features need gradual rollout:

Phase	Traffic	Duration	Gate to Next
Shadow	0% (logging only)	2 weeks	Accuracy ≥85%
Canary	5%	1 week	No major issues
Beta	25%	2 weeks	Accuracy ≥88%
GA	100%	Ongoing	Continuous monitoring

Section 8: Monitoring & Alerts

What to track post-launch:

Metric	Alert Threshold	Escalation
Accuracy	<85%	Page on-call
Latency p99	>1000ms	Page on-call
Error rate	>1%	Slack alert
User appeals	>5% increase	Weekly review

Common PRD Mistakes

1. Treating AI like traditional software

Bad: "The model will correctly classify all spam"
Good: "The model will classify spam with ≥92% precision"

2. Ignoring edge cases

Bad: No mention of what happens when confidence is low
Good: Explicit fallback to human review below 70% confidence

3. No baseline comparison

Bad: "We want high accuracy"
Good: "We want 90% accuracy vs. current 75% rule-based system"

4. Missing monitoring requirements

Bad: PRD ends at launch
Good: PRD includes ongoing metrics, alerts, and retraining triggers

Template Checklist

Before finalizing your AI PRD:

Problem is clearly defined with evidence
Success metrics have specific numbers
Input/output formats are specified
Edge cases are documented
Human-in-the-loop touchpoints defined
Rollout plan has gates between phases
Monitoring and alerting requirements included
Baseline comparison established

Key Takeaway

AI PRDs must acknowledge uncertainty. Define success with ranges and thresholds, not absolutes. Plan for errors, monitoring, and continuous improvement.

Next: Should you build AI in-house or buy from vendors? Let's explore the decision framework. :::