Introduction to MLOps
The ML Lifecycle
3 min read
Unlike traditional software with a linear development cycle, ML systems follow a continuous loop. Understanding this lifecycle is key to building effective MLOps practices.
The ML Development Loop
┌─────────────────────────────────────────┐
│ │
▼ │
┌────────┐ ┌────────┐ ┌────────┐ ┌──┴─────┐
│ Data │───▶│ Train │───▶│ Deploy │───▶│Monitor │
└────────┘ └────────┘ └────────┘ └────────┘
▲ │
│ │
└─────────── Retrain ◄────────────────────┘
Stage Breakdown
| Stage | Activities | Key Tools |
|---|---|---|
| Data | Collection, validation, versioning | DVC, Great Expectations |
| Train | Experimentation, model building | MLflow, W&B |
| Deploy | Packaging, serving, scaling | BentoML, KServe |
| Monitor | Performance tracking, drift detection | Evidently, Arize |
| Retrain | Trigger detection, automated pipelines | Kubeflow, Airflow |
Data Stage
The foundation of any ML system. Poor data leads to poor models.
# Example: DVC for data versioning
# Initialize DVC in your project
# $ dvc init
# Track a dataset
# $ dvc add data/training_data.csv
# This creates:
# - data/training_data.csv.dvc (metadata)
# - .gitignore updated to exclude large file
Key activities:
- Data collection and ingestion
- Data validation and quality checks
- Feature engineering
- Data versioning and lineage
Train Stage
Where data becomes models through experimentation.
import mlflow
# Track experiments with MLflow
mlflow.set_experiment("customer-churn")
with mlflow.start_run():
# Log parameters
mlflow.log_param("model_type", "random_forest")
mlflow.log_param("n_estimators", 100)
# Train model
model = train_model(X_train, y_train)
# Log metrics
mlflow.log_metric("accuracy", evaluate(model, X_test, y_test))
# Save model
mlflow.sklearn.log_model(model, "model")
Key activities:
- Hyperparameter tuning
- Model selection
- Experiment tracking
- Model validation
Deploy Stage
Moving models from notebooks to production.
# Example: BentoML service definition
import bentoml
@bentoml.service
class ChurnPredictor:
def __init__(self):
self.model = load_model()
@bentoml.api
def predict(self, features: dict) -> float:
return self.model.predict([features])[0]
Key activities:
- Model packaging
- Serving infrastructure
- A/B testing
- Rollback strategies
Monitor Stage
Production models need constant observation.
| What to Monitor | Why |
|---|---|
| Prediction latency | User experience, SLAs |
| Data drift | Input distribution changes |
| Model accuracy | Performance degradation |
| Resource usage | Cost optimization |
The Continuous Loop
Unlike "deploy and done" software, ML systems require:
- Continuous monitoring - Models degrade over time
- Automatic triggers - Detect when retraining is needed
- Automated pipelines - Retrain without manual intervention
- Gradual rollouts - Deploy new models safely
Key insight: The ML lifecycle is not a waterfall—it's a flywheel. Each iteration should improve the system.
Next, we'll explore MLOps maturity levels and what distinguishes advanced organizations. :::