AWS Architecture & Services Deep Dive
AWS Storage Services: S3, EBS, EFS & FSx
Storage selection impacts performance, cost, and availability. Interviewers frequently ask about trade-offs between storage options.
S3: Object Storage
Storage Classes & Use Cases
| Class | Retrieval Time | Use Case | Cost (per GB/month) |
|---|---|---|---|
| Standard | Immediate | Frequently accessed | $0.023 |
| Intelligent-Tiering | Immediate | Unknown access patterns | $0.0025 + monitoring |
| Standard-IA | Immediate | Infrequent, quick access needed | $0.0125 |
| One Zone-IA | Immediate | Reproducible data, cost-sensitive | $0.01 |
| Glacier Instant | Milliseconds | Archive, immediate access | $0.004 |
| Glacier Flexible | 1-12 hours | Archive, flexible retrieval | $0.0036 |
| Glacier Deep Archive | 12-48 hours | Long-term archive | $0.00099 |
Interview Question: S3 Performance
Q: "Your application needs to read 50,000 objects from S3 in under 60 seconds. How do you optimize?"
A: S3 performance optimization strategies:
- Prefix parallelization: Distribute objects across multiple prefixes (3,500 PUT/5,500 GET per prefix per second)
- S3 Transfer Acceleration: Use CloudFront edge locations for faster uploads
- Multipart downloads: Download large objects in parallel chunks
- Request parallelization: Use concurrent connections (50K objects ÷ 60s = ~833 requests/second is achievable)
- S3 Select: Query data within objects to reduce transfer
S3 Security Best Practices
- Enable S3 Block Public Access at account level
- Use bucket policies with least-privilege
- Enable SSE-S3 or SSE-KMS encryption by default
- Enable versioning for critical data
- Use S3 Access Points for multi-tenant access control
EBS: Block Storage
Volume Types Comparison
| Type | IOPS | Throughput | Use Case |
|---|---|---|---|
| gp3 | Up to 16,000 | 1,000 MB/s | General workloads, boot volumes |
| gp2 | Burst to 16,000 | 250 MB/s | Legacy, burstable workloads |
| io2 Block Express | 256,000 | 4,000 MB/s | Critical databases, SAP HANA |
| st1 | N/A | 500 MB/s | Big data, log processing |
| sc1 | N/A | 250 MB/s | Cold data, infrequent access |
Interview Question: EBS vs Instance Store
Q: "When would you use instance store instead of EBS?"
A: Instance store (ephemeral) suits:
- Temporary data: Scratch space, buffers, caches
- High IOPS needs: i3en instances offer 400K IOPS
- Cost sensitivity: No additional storage charges
- Distributed systems: Where data is replicated elsewhere (Cassandra, Kafka)
Critical: Data is lost on stop/terminate. Never use for durable storage.
EBS Optimization Pattern
Database Tier:
- Primary: io2 (256K IOPS, Multi-Attach disabled)
- Replicas: gp3 (16K IOPS, cost-optimized)
Application Tier:
- Boot: gp3 (3,000 IOPS baseline)
- Data: Based on workload
Archive:
- Snapshots to S3 Glacier
EFS: Elastic File System
Performance Modes
| Mode | Use Case | Latency |
|---|---|---|
| General Purpose | Web serving, CMS, containers | Low (sub-ms) |
| Max I/O | Big data, media processing | Higher (ms) |
Throughput Modes
| Mode | Behavior | Use Case |
|---|---|---|
| Bursting | Scales with storage size | Variable workloads |
| Provisioned | Fixed throughput | Consistent performance |
| Elastic | Auto-scales throughput | Unpredictable workloads |
Interview Question: EFS vs EBS
Q: "Your application runs on 5 EC2 instances and needs shared file storage. Compare EFS and EBS."
A:
| Factor | EFS | EBS Multi-Attach |
|---|---|---|
| Sharing | Thousands of instances | Up to 16 io2 volumes |
| Protocol | NFS (POSIX-compliant) | Block-level |
| Region scope | Multi-AZ | Single AZ |
| Use case | Shared content, CMS | Clustered databases |
| Cost | Higher ($0.30/GB) | Lower ($0.125/GB for gp3) |
Recommendation: EFS for true shared filesystem needs; EBS Multi-Attach only for specific cluster applications.
FSx: Managed File Systems
FSx Options
| Service | File System | Use Case |
|---|---|---|
| FSx for Windows | NTFS | Windows workloads, AD integration |
| FSx for Lustre | Lustre | HPC, ML training, video rendering |
| FSx for NetApp ONTAP | ONTAP | Enterprise NAS replacement |
| FSx for OpenZFS | ZFS | Linux/NFS workloads |
Interview Question: FSx for Lustre with S3
Q: "You need to process 10TB of data from S3 with high throughput for ML training. What architecture?"
A: Use FSx for Lustre with S3 integration:
- Create FSx for Lustre linked to S3 bucket
- Data is lazy-loaded (not copied) from S3
- 100+ GB/s throughput for ML workloads
- Write results back to S3 automatically
- Delete FSx after processing (pay only for compute time)
Storage Decision Framework
Object storage (any size, web-accessible)? → S3
Block storage (single instance)?
└── High IOPS (>16K)? → io2 Block Express
└── General workload? → gp3
└── Throughput-intensive? → st1
Shared file system?
└── Windows? → FSx for Windows
└── HPC/ML? → FSx for Lustre
└── General Linux? → EFS or FSx for OpenZFS
Cost Tip: Always consider S3 Intelligent-Tiering for data with unknown access patterns - it automatically optimizes costs.
Next, we'll cover AWS networking fundamentals. :::