Platform Team Operations & Maturity
Platform Maturity Model
Platform maturity models help assess current capabilities and plan improvement. This lesson covers the five levels of platform maturity, assessment techniques, and roadmaps for advancement.
Why Maturity Models Matter
Maturity models provide:
- Baseline assessment: Understand where you are today
- Goal setting: Define target state for your platform
- Prioritization: Focus on highest-impact improvements
- Communication: Shared language with stakeholders
- Progress tracking: Measure advancement over time
Five Levels of Platform Maturity
┌─────────────────────────────────────────────────────────────────────┐
│ Platform Maturity Levels │
├─────────────────────────────────────────────────────────────────────┤
│ │
│ Level 4: Optimizing ████████████████████████ │
│ • AI-assisted operations │ Continuous improvement │ │
│ • Predictive scaling │ Innovation focus │ │
│ • Self-healing platform └──────────────────────────│
│ │
│ Level 3: Standardized ████████████████████ │
│ • Golden paths enforced │ Consistent experience │ │
│ • Self-service mature │ Measured outcomes │ │
│ • Platform-as-Product └──────────────────────────│
│ │
│ Level 2: Managed ████████████████ │
│ • Central tooling │ Reduced variance │ │
│ • Basic self-service │ Improving DX │ │
│ • Documentation exists └──────────────────────────│
│ │
│ Level 1: Reactive ████████████ │
│ • Ticket-based requests │ High manual effort │ │
│ • Inconsistent tooling │ Slow onboarding │ │
│ • Tribal knowledge └──────────────────────────│
│ │
│ Level 0: Ad-hoc ████████ │
│ • No platform team │ Complete chaos │ │
│ • Every team does their own │ Maximum friction │ │
│ • No standardization └──────────────────────────│
│ │
└─────────────────────────────────────────────────────────────────────┘
Level 0: Ad-hoc
No formal platform exists. Each team manages infrastructure independently.
level_0_characteristics:
infrastructure:
- Teams provision their own cloud resources
- No standard deployment process
- Inconsistent security practices
- Manual server configuration
developer_experience:
- High cognitive load
- Steep learning curve for new hires
- Duplicated effort across teams
- Slow time-to-first-deploy
operations:
- No central visibility
- Incident response varies by team
- Cost tracking impossible
- Compliance gaps
indicators:
- "Every team uses different CI tools"
- "New developers take weeks to deploy"
- "We don't know our cloud spend by team"
- "Security finds issues in every audit"
Level 1: Reactive
Basic platform team exists but operates in ticket-driven mode.
level_1_characteristics:
infrastructure:
- Central team handles requests
- Standard tooling recommended (not enforced)
- Some automation exists
- Basic templates available
developer_experience:
- Requests take days/weeks
- Bottleneck on platform team
- Documentation scattered
- Some tribal knowledge captured
operations:
- Central monitoring exists
- Incident response improving
- Basic cost tracking
- Manual compliance checks
indicators:
- "Submit a ticket for a new database"
- "Platform team is always backlogged"
- "We have docs but they're outdated"
- "Onboarding takes 1-2 weeks"
typical_metrics:
ticket_backlog: "50+ tickets"
request_lead_time: "3-7 days"
onboarding_time: "1-2 weeks"
deployment_frequency: "Weekly"
Level 2: Managed
Platform provides consistent tooling with basic self-service.
level_2_characteristics:
infrastructure:
- Self-service for common resources
- Standard CI/CD pipelines
- Infrastructure as Code adopted
- Environment provisioning automated
developer_experience:
- Portal with basic catalog
- Templates for new services
- Centralized documentation
- Defined golden paths
operations:
- Centralized observability
- SLOs defined for platform
- Cost allocation by team
- Automated compliance scans
indicators:
- "I can create a database without tickets"
- "All services use the same CI pipeline"
- "New services start from templates"
- "I can see my team's cloud costs"
typical_metrics:
self_service_ratio: "50-70%"
request_lead_time: "< 1 day"
onboarding_time: "2-5 days"
deployment_frequency: "Daily"
Level 3: Standardized
Platform operates as an internal product with mature self-service.
level_3_characteristics:
infrastructure:
- Full self-service infrastructure
- Policy-as-Code enforced
- Multi-cloud abstraction
- Automatic resource optimization
developer_experience:
- Comprehensive developer portal
- All services in catalog
- Rich documentation (TechDocs)
- Golden paths for all common patterns
operations:
- Error budgets and SLOs active
- Automated remediation
- FinOps practices mature
- Continuous compliance
indicators:
- "I deploy to production multiple times daily"
- "Platform automatically scales my services"
- "I've never submitted an infra ticket"
- "Compliance is automatic"
typical_metrics:
self_service_ratio: "90%+"
request_lead_time: "Minutes"
onboarding_time: "< 1 day"
deployment_frequency: "On-demand"
Level 4: Optimizing
Platform continuously improves with advanced automation and AI assistance.
level_4_characteristics:
infrastructure:
- AI-optimized resource allocation
- Predictive scaling
- Self-healing systems
- Automatic cost optimization
developer_experience:
- AI-assisted development
- Personalized recommendations
- Intelligent error resolution
- Continuous experience improvement
operations:
- AIOps for incident prediction
- Autonomous remediation
- Real-time cost optimization
- Proactive security patching
indicators:
- "Platform predicts and prevents incidents"
- "AI suggests optimal configurations"
- "Costs automatically optimized weekly"
- "Zero-touch operations for 90%+ of issues"
typical_metrics:
automation_rate: "95%+"
incident_prevention_rate: "70%+"
time_to_remediation: "< 5 minutes"
developer_nps: "70+"
Maturity Assessment Scorecard
Use this scorecard to assess your current level across key dimensions.
# maturity-assessment.yaml
assessment_dimensions:
self_service:
level_0:
description: "No self-service"
indicators:
- All infrastructure requests via tickets
- Manual provisioning for everything
score: 0
level_1:
description: "Limited self-service"
indicators:
- Some resources available via wiki guides
- Scripts shared informally
score: 1
level_2:
description: "Basic self-service"
indicators:
- Portal with resource provisioning
- Templates for common patterns
score: 2
level_3:
description: "Comprehensive self-service"
indicators:
- All infrastructure via self-service
- Policy guardrails automatic
score: 3
level_4:
description: "Intelligent self-service"
indicators:
- AI recommendations for configurations
- Predictive resource provisioning
score: 4
developer_experience:
level_0:
description: "Fragmented"
score: 0
level_1:
description: "Documented"
score: 1
level_2:
description: "Consistent"
score: 2
level_3:
description: "Optimized"
score: 3
level_4:
description: "Delightful"
score: 4
observability:
level_0:
description: "No monitoring"
score: 0
level_1:
description: "Basic metrics"
score: 1
level_2:
description: "Centralized dashboards"
score: 2
level_3:
description: "SLO-driven"
score: 3
level_4:
description: "Predictive/AIOps"
score: 4
security_compliance:
level_0:
description: "Manual audits"
score: 0
level_1:
description: "Periodic scans"
score: 1
level_2:
description: "Automated scans"
score: 2
level_3:
description: "Policy-as-Code"
score: 3
level_4:
description: "Continuous compliance"
score: 4
cost_management:
level_0:
description: "Unknown costs"
score: 0
level_1:
description: "Monthly reports"
score: 1
level_2:
description: "Team allocation"
score: 2
level_3:
description: "Real-time tracking"
score: 3
level_4:
description: "AI optimization"
score: 4
scoring:
total_possible: 20
level_mapping:
0-4: "Level 0 (Ad-hoc)"
5-8: "Level 1 (Reactive)"
9-12: "Level 2 (Managed)"
13-16: "Level 3 (Standardized)"
17-20: "Level 4 (Optimizing)"
Assessment Template
# platform-assessment-template.yaml
assessment:
date: "2025-01-15"
assessor: "Platform Team"
organization: "Example Corp"
dimensions:
self_service:
current_score: 2
evidence:
- "Backstage portal deployed"
- "3 templates available"
- "Database provisioning self-service"
gaps:
- "No Kubernetes namespace self-service"
- "Manual secret management"
developer_experience:
current_score: 2
evidence:
- "Central documentation exists"
- "Golden path for web services"
- "Onboarding time ~3 days"
gaps:
- "No TechDocs integration"
- "Local development setup manual"
observability:
current_score: 2
evidence:
- "Prometheus/Grafana deployed"
- "Standard dashboards available"
gaps:
- "No SLOs defined"
- "Alerting inconsistent"
security_compliance:
current_score: 1
evidence:
- "Quarterly security scans"
- "Basic RBAC in place"
gaps:
- "No policy-as-code"
- "Compliance is manual"
cost_management:
current_score: 1
evidence:
- "Monthly AWS cost reports"
gaps:
- "No team allocation"
- "No optimization automation"
summary:
total_score: 8
current_level: "Level 1 (Reactive)"
target_level: "Level 3 (Standardized)"
priority_gaps:
- "Implement policy-as-code (Kyverno)"
- "Deploy Kubecost for cost allocation"
- "Define platform SLOs"
Maturity Roadmap
Level 1 to Level 2 Roadmap
# roadmap-level-1-to-2.yaml
roadmap:
current_level: 1
target_level: 2
timeline: "6 months"
phase_1:
name: "Foundation"
duration: "Month 1-2"
objectives:
- Deploy developer portal (Backstage)
- Create first 3 software templates
- Centralize documentation
key_results:
- Backstage live with SSO
- 50% of services in catalog
- All teams can access docs
investments:
- Backstage setup: 2 engineers × 4 weeks
- Template creation: 1 engineer × 2 weeks
- Doc migration: 1 engineer × 2 weeks
phase_2:
name: "Self-Service"
duration: "Month 3-4"
objectives:
- Enable database self-service (Crossplane)
- Standardize CI/CD pipelines
- Implement cost allocation
key_results:
- Zero tickets for standard databases
- 80% of services use standard CI
- Cost visible by team
investments:
- Crossplane setup: 2 engineers × 4 weeks
- Pipeline templates: 1 engineer × 3 weeks
- Kubecost deployment: 1 engineer × 1 week
phase_3:
name: "Observability"
duration: "Month 5-6"
objectives:
- Deploy centralized observability
- Create standard dashboards
- Define initial SLOs
key_results:
- All services have dashboards
- Platform SLOs published
- Alert fatigue reduced 50%
investments:
- Prometheus/Grafana: 1 engineer × 2 weeks
- Dashboard templates: 1 engineer × 3 weeks
- SLO definition: Platform team × 2 weeks
success_criteria:
self_service_ratio: "> 50%"
onboarding_time: "< 1 week"
ticket_reduction: "50%"
developer_satisfaction: "> 60% positive"
Level 2 to Level 3 Roadmap
# roadmap-level-2-to-3.yaml
roadmap:
current_level: 2
target_level: 3
timeline: "9 months"
phase_1:
name: "Policy-as-Code"
duration: "Month 1-3"
objectives:
- Implement Kyverno/OPA for policies
- Automate compliance checks
- Enable policy self-service
key_results:
- 100% of deployments pass policy checks
- Compliance reports automated
- Teams can request policy exceptions
phase_2:
name: "Advanced Self-Service"
duration: "Month 4-6"
objectives:
- Complete infrastructure self-service
- Multi-environment automation
- Golden paths for all patterns
key_results:
- Zero infrastructure tickets
- Prod deployments fully automated
- 5+ golden path templates
phase_3:
name: "Platform-as-Product"
duration: "Month 7-9"
objectives:
- Implement feedback loops
- Track platform NPS
- Optimize based on metrics
key_results:
- Monthly developer surveys
- NPS > 50
- Feature prioritization data-driven
success_criteria:
self_service_ratio: "> 90%"
deployment_frequency: "On-demand"
developer_nps: "> 50"
compliance_automation: "100%"
Continuous Assessment
Quarterly Review Template
# quarterly-review.yaml
quarterly_platform_review:
metrics_review:
adoption:
- Services in catalog (current vs target)
- Template usage rate
- Self-service ratio
developer_experience:
- NPS score
- Support ticket volume
- Onboarding time
reliability:
- Platform uptime
- SLO achievement
- Incident count
efficiency:
- Cost per deployment
- Time to provision
- Automation rate
qualitative_review:
feedback_themes:
- Top 3 developer complaints
- Top 3 feature requests
- Success stories
roadmap_progress:
- Completed initiatives
- Delayed initiatives
- New priorities
action_planning:
continue:
- What's working well
stop:
- What to deprioritize
start:
- New initiatives
Summary
Platform maturity progression:
| Level | Focus | Key Capability |
|---|---|---|
| 0 - Ad-hoc | Survival | None |
| 1 - Reactive | Stability | Central team |
| 2 - Managed | Efficiency | Self-service basics |
| 3 - Standardized | Scale | Platform-as-Product |
| 4 - Optimizing | Innovation | AI-assisted ops |
Key principles:
- Assess honestly: Don't overestimate current state
- Progress incrementally: Don't skip levels
- Focus on outcomes: Metrics over features
- Iterate continuously: Maturity is a journey
Next Lesson: Future of Platform Engineering explores emerging trends, AI integration, and what platform engineering looks like in 2026 and beyond.
:::