Infrastructure & Deployment

Docker for ML Models

5 min read

Docker containerization is fundamental to MLOps. Interviewers expect you to optimize containers for ML workloads.

Interview Question: Multi-Stage Builds

Question: "How would you optimize a Docker image for serving a PyTorch model?"

Strong Answer Structure:

# Stage 1: Build stage - includes build tools
FROM python:3.11-slim AS builder

WORKDIR /app

# Install build dependencies
RUN apt-get update && apt-get install -y --no-install-recommends \
    build-essential \
    && rm -rf /var/lib/apt/lists/*

# Install Python dependencies
COPY requirements.txt .
RUN pip wheel --no-cache-dir --wheel-dir /wheels -r requirements.txt

# Stage 2: Runtime stage - minimal
FROM python:3.11-slim AS runtime

WORKDIR /app

# Copy only wheels, not build tools
COPY --from=builder /wheels /wheels
RUN pip install --no-cache-dir /wheels/* && rm -rf /wheels

# Copy model and inference code
COPY model/ ./model/
COPY src/ ./src/

# Non-root user for security
RUN useradd -m appuser && chown -R appuser:appuser /app
USER appuser

EXPOSE 8080
CMD ["python", "src/serve.py"]

Key Optimization Strategies

Strategy Impact How to Explain
Multi-stage builds 60-80% size reduction "Build dependencies don't ship to production"
Layer ordering Faster rebuilds "Put rarely-changing layers first (OS, deps)"
.dockerignore Smaller context "Exclude training data, notebooks, tests"
Slim base images 5x smaller "python:3.11-slim vs python:3.11"
No cache pip install 100MB+ savings "Wheels don't need cache in container"

Common Interview Follow-ups

Q: "How do you handle model weights?"

# Option 1: Bake into image (for small models <500MB)
COPY model_weights.pt /app/model/

# Option 2: Download at startup (for large models)
# Use init container or entrypoint script
ENV MODEL_PATH=/models/bert-large
RUN mkdir -p /models

# Option 3: Mount from volume (production)
# At runtime: -v /host/models:/app/models

Q: "How do you handle GPU containers?"

# Use NVIDIA base image for CUDA support
FROM nvidia/cuda:12.1-runtime-ubuntu22.04

# Or PyTorch with CUDA pre-installed
FROM pytorch/pytorch:2.1.0-cuda12.1-cudnn8-runtime

# Verify GPU access at runtime
# docker run --gpus all my-ml-image nvidia-smi

Security Best Practices

Interviewers often probe for security awareness:

# 1. Never run as root in production
USER appuser

# 2. Pin versions for reproducibility
FROM python:3.11.7-slim@sha256:abc123...

# 3. Scan images in CI/CD
# trivy image my-ml-image:latest

# 4. Don't embed secrets
# Bad: ENV AWS_SECRET_KEY=xyz
# Good: Use secrets manager at runtime

Interview Signal: Mentioning Trivy or Grype for container scanning shows security maturity.

Next, we'll cover Kubernetes interview questions. :::

Quiz

Module 2: Infrastructure & Deployment

Take Quiz