Infrastructure & Deployment

Cloud ML Infrastructure

4 min read

Cloud-specific ML services are core interview topics. Know the trade-offs between managed services and self-managed Kubernetes.

Managed vs Self-Managed Comparison

Aspect AWS SageMaker GCP Vertex AI Self-Managed K8s
Setup time Hours Hours Days/Weeks
Cost at scale Higher Higher Lower
Customization Limited Limited Full control
GPU availability On-demand On-demand Reserved instances
Vendor lock-in High High Low
Best for Quick POCs Quick POCs Production at scale

Interview Question: Build vs Buy

Question: "When would you use SageMaker vs self-managed Kubernetes?"

Framework Answer:

def choose_infrastructure(context):
    use_managed = (
        context["team_size"] < 5 and
        context["ml_models"] < 10 and
        context["budget"] > context["engineer_cost"] * 2 and
        context["time_to_market"] == "urgent"
    )

    use_kubernetes = (
        context["team_size"] >= 5 or
        context["ml_models"] >= 10 or
        context["multi_cloud"] is True or
        context["compliance"] in ["HIPAA", "PCI", "SOC2"] and
            context["requires_custom_controls"]
    )

    # Hybrid is often the answer
    return {
        "training": "managed" if data_stays_in_cloud else "k8s",
        "serving": "kubernetes",  # Lower latency, better scaling
        "experimentation": "managed",  # Faster iteration
    }

AWS SageMaker Deep Dive

Key Components to Know:

# SageMaker interview topics
sagemaker_components:
  training:
    - Spot instances for 70% cost reduction
    - Distributed training with parameter servers
    - SageMaker Debugger for training insights

  inference:
    - Real-time endpoints (synchronous)
    - Batch Transform (async, large datasets)
    - Multi-model endpoints (cost sharing)
    - Serverless inference (pay per request)

  mlops:
    - SageMaker Pipelines (orchestration)
    - Model Registry (versioning)
    - Model Monitor (drift detection)

Interview Question: "How do you reduce SageMaker inference costs?"

Answer:

  1. Multi-model endpoints: Load 100s of models on single endpoint
  2. Serverless inference: Pay only for requests, but cold start latency
  3. Autoscaling: Scale to zero during off-hours
  4. Spot instances: For training (not inference), 70% savings
  5. Inference optimization: Use Neuron compiler for AWS Inferentia chips

GCP Vertex AI Deep Dive

Key Differentiators:

# Vertex AI interview topics
vertex_components:
  unique_features:
    - AutoML for no-code model training
    - Feature Store (native integration)
    - Vizier for hyperparameter tuning
    - Matching Engine for vector search

  training:
    - Custom containers on Vertex Training
    - TPU support (v4 pods available)
    - Distributed training with Reduction Server

  serving:
    - Online prediction (real-time)
    - Batch prediction (large scale)
    - Private endpoints (VPC-native)

Interview Question: "When would you use TPUs vs GPUs?"

Answer:

  • TPUs: Large transformer training, Google-optimized (BERT, T5), batch processing
  • GPUs: Inference, PyTorch-heavy, custom architectures, real-time serving
  • Cost comparison: TPU v4 pods can be 3x more cost-efficient for training at scale

Multi-Cloud and Hybrid Patterns

# Multi-cloud ML architecture discussion
multi_cloud_reasons = [
    "GPU availability during shortages",
    "Best-of-breed services (Vertex AutoML + AWS Endpoints)",
    "Regulatory requirements (data residency)",
    "Vendor negotiation leverage"
]

# Key technologies for multi-cloud
multi_cloud_stack = {
    "orchestration": "Kubeflow Pipelines (cloud-agnostic)",
    "model_registry": "MLflow (portable)",
    "serving": "Seldon Core or KServe",
    "monitoring": "Prometheus + Grafana",
    "infrastructure": "Terraform with cloud-specific modules"
}

Interview Insight: Companies increasingly ask about multi-cloud due to GPU shortages and cost optimization. Show you understand both managed services AND self-managed Kubernetes.

Next module covers ML Pipelines & Orchestration interview questions. :::

Quiz

Module 2: Infrastructure & Deployment

Take Quiz