AWS Architecture & Services Deep Dive

AWS Storage Services: S3, EBS, EFS & FSx

4 min read

Storage selection impacts performance, cost, and availability. Interviewers frequently ask about trade-offs between storage options.

S3: Object Storage

Storage Classes & Use Cases

Class Retrieval Time Use Case Cost (per GB/month)
Standard Immediate Frequently accessed $0.023
Intelligent-Tiering Immediate Unknown access patterns $0.0025 + monitoring
Standard-IA Immediate Infrequent, quick access needed $0.0125
One Zone-IA Immediate Reproducible data, cost-sensitive $0.01
Glacier Instant Milliseconds Archive, immediate access $0.004
Glacier Flexible 1-12 hours Archive, flexible retrieval $0.0036
Glacier Deep Archive 12-48 hours Long-term archive $0.00099

Interview Question: S3 Performance

Q: "Your application needs to read 50,000 objects from S3 in under 60 seconds. How do you optimize?"

A: S3 performance optimization strategies:

  1. Prefix parallelization: Distribute objects across multiple prefixes (3,500 PUT/5,500 GET per prefix per second)
  2. S3 Transfer Acceleration: Use CloudFront edge locations for faster uploads
  3. Multipart downloads: Download large objects in parallel chunks
  4. Request parallelization: Use concurrent connections (50K objects ÷ 60s = ~833 requests/second is achievable)
  5. S3 Select: Query data within objects to reduce transfer

S3 Security Best Practices

  • Enable S3 Block Public Access at account level
  • Use bucket policies with least-privilege
  • Enable SSE-S3 or SSE-KMS encryption by default
  • Enable versioning for critical data
  • Use S3 Access Points for multi-tenant access control

EBS: Block Storage

Volume Types Comparison

Type IOPS Throughput Use Case
gp3 Up to 16,000 1,000 MB/s General workloads, boot volumes
gp2 Burst to 16,000 250 MB/s Legacy, burstable workloads
io2 Block Express 256,000 4,000 MB/s Critical databases, SAP HANA
st1 N/A 500 MB/s Big data, log processing
sc1 N/A 250 MB/s Cold data, infrequent access

Interview Question: EBS vs Instance Store

Q: "When would you use instance store instead of EBS?"

A: Instance store (ephemeral) suits:

  • Temporary data: Scratch space, buffers, caches
  • High IOPS needs: i3en instances offer 400K IOPS
  • Cost sensitivity: No additional storage charges
  • Distributed systems: Where data is replicated elsewhere (Cassandra, Kafka)

Critical: Data is lost on stop/terminate. Never use for durable storage.

EBS Optimization Pattern

Database Tier:
  - Primary: io2 (256K IOPS, Multi-Attach disabled)
  - Replicas: gp3 (16K IOPS, cost-optimized)

Application Tier:
  - Boot: gp3 (3,000 IOPS baseline)
  - Data: Based on workload

Archive:
  - Snapshots to S3 Glacier

EFS: Elastic File System

Performance Modes

Mode Use Case Latency
General Purpose Web serving, CMS, containers Low (sub-ms)
Max I/O Big data, media processing Higher (ms)

Throughput Modes

Mode Behavior Use Case
Bursting Scales with storage size Variable workloads
Provisioned Fixed throughput Consistent performance
Elastic Auto-scales throughput Unpredictable workloads

Interview Question: EFS vs EBS

Q: "Your application runs on 5 EC2 instances and needs shared file storage. Compare EFS and EBS."

A:

Factor EFS EBS Multi-Attach
Sharing Thousands of instances Up to 16 io2 volumes
Protocol NFS (POSIX-compliant) Block-level
Region scope Multi-AZ Single AZ
Use case Shared content, CMS Clustered databases
Cost Higher ($0.30/GB) Lower ($0.125/GB for gp3)

Recommendation: EFS for true shared filesystem needs; EBS Multi-Attach only for specific cluster applications.

FSx: Managed File Systems

FSx Options

Service File System Use Case
FSx for Windows NTFS Windows workloads, AD integration
FSx for Lustre Lustre HPC, ML training, video rendering
FSx for NetApp ONTAP ONTAP Enterprise NAS replacement
FSx for OpenZFS ZFS Linux/NFS workloads

Interview Question: FSx for Lustre with S3

Q: "You need to process 10TB of data from S3 with high throughput for ML training. What architecture?"

A: Use FSx for Lustre with S3 integration:

  1. Create FSx for Lustre linked to S3 bucket
  2. Data is lazy-loaded (not copied) from S3
  3. 100+ GB/s throughput for ML workloads
  4. Write results back to S3 automatically
  5. Delete FSx after processing (pay only for compute time)

Storage Decision Framework

Object storage (any size, web-accessible)? → S3
Block storage (single instance)?
  └── High IOPS (>16K)? → io2 Block Express
  └── General workload? → gp3
  └── Throughput-intensive? → st1
Shared file system?
  └── Windows? → FSx for Windows
  └── HPC/ML? → FSx for Lustre
  └── General Linux? → EFS or FSx for OpenZFS

Cost Tip: Always consider S3 Intelligent-Tiering for data with unknown access patterns - it automatically optimizes costs.

Next, we'll cover AWS networking fundamentals. :::

Quiz

Module 2: AWS Architecture & Services Deep Dive

Take Quiz