Infrastructure & Deployment
Docker for ML Models
5 min read
Docker containerization is fundamental to MLOps. Interviewers expect you to optimize containers for ML workloads.
Interview Question: Multi-Stage Builds
Question: "How would you optimize a Docker image for serving a PyTorch model?"
Strong Answer Structure:
# Stage 1: Build stage - includes build tools
FROM python:3.11-slim AS builder
WORKDIR /app
# Install build dependencies
RUN apt-get update && apt-get install -y --no-install-recommends \
build-essential \
&& rm -rf /var/lib/apt/lists/*
# Install Python dependencies
COPY requirements.txt .
RUN pip wheel --no-cache-dir --wheel-dir /wheels -r requirements.txt
# Stage 2: Runtime stage - minimal
FROM python:3.11-slim AS runtime
WORKDIR /app
# Copy only wheels, not build tools
COPY /wheels /wheels
RUN pip install --no-cache-dir /wheels/* && rm -rf /wheels
# Copy model and inference code
COPY model/ ./model/
COPY src/ ./src/
# Non-root user for security
RUN useradd -m appuser && chown -R appuser:appuser /app
USER appuser
EXPOSE 8080
CMD ["python", "src/serve.py"]
Key Optimization Strategies
| Strategy | Impact | How to Explain |
|---|---|---|
| Multi-stage builds | 60-80% size reduction | "Build dependencies don't ship to production" |
| Layer ordering | Faster rebuilds | "Put rarely-changing layers first (OS, deps)" |
| .dockerignore | Smaller context | "Exclude training data, notebooks, tests" |
| Slim base images | 5x smaller | "python:3.11-slim vs python:3.11" |
| No cache pip install | 100MB+ savings | "Wheels don't need cache in container" |
Common Interview Follow-ups
Q: "How do you handle model weights?"
# Option 1: Bake into image (for small models <500MB)
COPY model_weights.pt /app/model/
# Option 2: Download at startup (for large models)
# Use init container or entrypoint script
ENV MODEL_PATH=/models/bert-large
RUN mkdir -p /models
# Option 3: Mount from volume (production)
# At runtime: -v /host/models:/app/models
Q: "How do you handle GPU containers?"
# Use NVIDIA base image for CUDA support
FROM nvidia/cuda:12.1-runtime-ubuntu22.04
# Or PyTorch with CUDA pre-installed
FROM pytorch/pytorch:2.1.0-cuda12.1-cudnn8-runtime
# Verify GPU access at runtime
# docker run --gpus all my-ml-image nvidia-smi
Security Best Practices
Interviewers often probe for security awareness:
# 1. Never run as root in production
USER appuser
# 2. Pin versions for reproducibility
FROM python:3.11.7-slim@sha256:abc123...
# 3. Scan images in CI/CD
# trivy image my-ml-image:latest
# 4. Don't embed secrets
# Bad: ENV AWS_SECRET_KEY=xyz
# Good: Use secrets manager at runtime
Interview Signal: Mentioning Trivy or Grype for container scanning shows security maturity.
Next, we'll cover Kubernetes interview questions. :::