GCP & Azure Fundamentals for Multi-Cloud

GCP Data & AI Services: BigQuery, Pub/Sub & Vertex AI

4 min read

Google's data and AI services are often considered best-in-class. Understanding these services is crucial for architect interviews at data-centric companies.

BigQuery: Serverless Data Warehouse

BigQuery is GCP's flagship analytics service and a key differentiator.

Architecture & Key Features

What Makes BigQuery Unique:

  • Serverless: No infrastructure management
  • Separation of compute and storage: Pay for what you query
  • Columnar storage: Optimized for analytics
  • Dremel execution engine: Massively parallel query execution
  • Petabyte scale: Handle massive datasets

Pricing Models

ModelBest ForPricing
On-demandVariable workloads$6.25/TB scanned
Flat-rate (Editions)Predictable workloads$2,000/100 slots/month
AutoscalingVariable with baselineBaseline + burst slots

⚠ Prices change frequently. The values above are for illustration only and may be out of date. Always verify current pricing directly with the provider before making cost decisions: Anthropic · OpenAI · Google Gemini · Google Vertex AI · AWS Bedrock · Azure OpenAI · Mistral · Cohere · Together AI · DeepSeek · Groq · Fireworks AI · Perplexity · xAI · Cursor · GitHub Copilot · Windsurf.

Interview Question: BigQuery vs Redshift

Q: "When would you recommend BigQuery over Amazon Redshift?"

A:

FactorBigQueryRedshift
ManagementServerless (no clusters)Cluster management
ScalingAutomatic, instantManual, downtime for resize
PricingPer TB scannedPer node hour
Best ForVariable/ad-hoc queriesPredictable, steady workloads
StreamingNative ($0.05/GB)Kinesis integration required
ML IntegrationBigQuery ML built-inSageMaker AI integration

Choose BigQuery when:

  • Unknown or variable query patterns
  • Team wants zero ops overhead
  • Need built-in ML capabilities
  • Real-time streaming analytics required

BigQuery Best Practices

Cost Optimization:

-- Use partitioning to reduce scanned data
CREATE TABLE myproject.mydataset.events
PARTITION BY DATE(event_timestamp)
CLUSTER BY user_id
AS SELECT * FROM raw_events;

-- Preview query cost before running
-- Click "More" → "Query Settings" → "Maximum bytes billed"

Performance Optimization:

  • Partition by date/timestamp columns
  • Cluster by high-cardinality filter columns
  • Avoid SELECT * (specify columns)
  • Use materialized views for common aggregations

Cloud Pub/Sub: Messaging & Streaming

Google's managed messaging service, similar to AWS SNS + SQS combined.

Key Characteristics

FeaturePub/SubAWS Equivalent
ModelPublish-subscribeSNS + SQS combined
OrderingOptional (per-key)SQS FIFO
Retention7 days default (configurable to 31)14 days max (SQS)
Dead LetterSupportedSupported
Push/PullBothSNS push, SQS pull

Pub/Sub Architecture Patterns

Event-Driven Architecture:

Publishers → Topic → Subscriptions → Subscribers
                    ├── Pull Subscription → Cloud Functions
                    ├── Push Subscription → Cloud Run
                    └── BigQuery Subscription → BigQuery (direct)

BigQuery Subscriptions (unique to GCP): Write messages directly to BigQuery without code.

Interview Question: Message Ordering

Q: "How do you guarantee message ordering in Pub/Sub?"

A: Use ordering keys:

# Publisher
from google.cloud import pubsub_v1

publisher = pubsub_v1.PublisherClient()
topic_path = publisher.topic_path(project, topic)

# Messages with same ordering_key are ordered
publisher.publish(
    topic_path,
    data=b"message",
    ordering_key="user-123"  # All user-123 messages in order
)

Important: Ordering is per-subscription, per-ordering-key. Messages with different keys may arrive out of order.

Dataflow: Stream & Batch Processing

GCP's managed Apache Beam service.

When to Use Dataflow

ScenarioDataflowBigQuery
Real-time transformationYesLimited (streaming inserts)
Complex windowingYesNo
Cross-service ETLYesLimited
ML inference pipelineYesBigQuery ML only
Cost at scaleHigherLower for pure analytics

Common Dataflow Patterns

Streaming ETL:

Pub/Sub → Dataflow (transform, enrich, window) → BigQuery

Batch Processing:

Cloud Storage (CSV/JSON) → Dataflow → BigQuery/Bigtable

Vertex AI: Unified ML Platform

Google's managed ML platform, competing with Amazon SageMaker AI (rebranded from "Amazon SageMaker" in 2024).

Vertex AI Components

ComponentPurposeAWS Equivalent
WorkbenchManaged notebooksSageMaker AI Studio
TrainingCustom model trainingSageMaker AI Training
PredictionModel servingSageMaker AI Endpoints
PipelinesML workflow orchestrationSageMaker AI Pipelines
Feature StoreFeature managementSageMaker AI Feature Store
Model GardenPre-trained modelsSageMaker AI JumpStart
Gemini APIFoundation modelsAmazon Bedrock

Interview Question: Vertex AI vs SageMaker

Q: "What are the strengths of Vertex AI compared to SageMaker?"

A:

Vertex AI Strengths:

  • Tighter BigQuery integration (direct training from tables)
  • AutoML more mature (Google's ML heritage)
  • Gemini models for generative AI
  • Simpler pricing model
  • Better integration with data stack (BigQuery, Dataflow)

SageMaker Strengths:

  • Larger ecosystem of built-in algorithms
  • More deployment options (edge, batch, async)
  • Better multi-account governance
  • More mature MLOps features
  • Wider third-party integration

Data Architecture Decision Tree

Analytics/BI workload?
  └── Yes → BigQuery (serverless, cost-effective)
  └── Need real-time transformation? → Dataflow + BigQuery

Messaging/Events?
  └── Simple pub/sub → Pub/Sub
  └── Need strong ordering → Pub/Sub with ordering keys
  └── Direct to BigQuery → BigQuery Subscription

ML/AI?
  └── Tabular data in BigQuery → BigQuery ML
  └── Custom training → Vertex AI Training
  └── Foundation models → Gemini API / Model Garden

Pro Tip: GCP's data services are deeply integrated. A common pattern is: Pub/Sub → Dataflow → BigQuery → Vertex AI. This integration is stronger than AWS equivalents.

Next, we'll explore Azure core services. :::

Quick check: how does this lesson land for you?

Quiz

Module 3: GCP & Azure Fundamentals for Multi-Cloud

Take Quiz
FREE WEEKLY NEWSLETTER

Stay on the Nerd Track

One email per week — courses, deep dives, tools, and AI experiments.

No spam. Unsubscribe anytime.