Back to Course|Backend Engineer Interviews: Databases, APIs & Distributed Systems Mastery
Lab

Design a URL Shortener Service

25 min
Intermediate
3 Free Attempts

Instructions

Objective

Design and implement a URL shortener service that handles 100 million new URLs per month with a 10:1 read-to-write ratio and 5-year data retention. This lab tests your ability to combine API design, encoding algorithms, database modeling, caching, analytics, and scaling reasoning into a cohesive system.

Capacity Estimation

Before writing any code, complete the capacity estimation section in the starter code. Use these parameters:

  • New URLs per month: 100 million
  • Read-to-write ratio: 10:1
  • Average URL record size: 500 bytes (short code + long URL + metadata)
  • Retention period: 5 years
  • Cache hit ratio target: 80% (Pareto distribution — 20% of URLs get 80% of traffic)

You should calculate:

  • Write QPS (average and peak at 3x)
  • Read QPS (average and peak at 3x)
  • Total storage required over 5 years
  • Daily cache memory requirement
  • Daily bandwidth for reads

Requirements

1. API Endpoints

Design a RESTful API with these endpoints:

  • POST /api/shorten — Create a short URL from a long URL
    • Request body: { "long_url": "https://...", "custom_alias?": "my-link", "expires_at?": "2027-01-01" }
    • Response: { "short_url": "https://short.url/Ab3xK9z", "short_code": "Ab3xK9z", "expires_at": "2027-01-01" }
  • GET /:shortCode — Redirect to the original URL (HTTP 301 or 302)
  • GET /api/stats/:shortCode — Get analytics for a short URL
    • Response: { "total_clicks": 1523, "clicks_by_day": [...], "top_referrers": [...], "top_countries": [...] }
  • DELETE /api/urls/:shortCode — Delete a short URL (owner only)

2. Base62 Encoding Logic

Implement a Base62Encoder class that:

  • Converts a numeric ID to a Base62 string (characters: a-z, A-Z, 0-9)
  • Converts a Base62 string back to a numeric ID
  • Generates codes of exactly 7 characters (pad with leading a if needed)
  • Handles the full range: 0 to 62^7 - 1 (3.5 trillion)

3. Database Schema

Design a schema that supports:

  • Storing URL mappings with metadata (creator, creation date, expiration)
  • Efficient lookup by short code (primary access pattern)
  • Analytics event storage (click timestamp, IP, user agent, referrer, country)
  • Rate limiting state per API key

Provide schemas for both PostgreSQL (relational) and DynamoDB (NoSQL) so the grader can evaluate your understanding of both paradigms.

4. Redis Caching Layer

Implement a CacheService class that:

  • Uses cache-aside pattern for URL lookups
  • Sets TTL based on URL popularity (hot URLs get longer TTL)
  • Handles cache invalidation on URL deletion
  • Includes cache warming for newly created URLs that are expected to be popular

5. Analytics Tracking

Implement a ClickTracker that:

  • Records each redirect as an analytics event (non-blocking, fire-and-forget)
  • Extracts geo-location from IP (stub the geo lookup)
  • Parses user agent for device/browser info
  • Batches events for efficient database writes
  • Provides aggregated stats (clicks by day, top referrers, top countries)

6. Rate Limiting

Implement a token bucket rate limiter that:

  • Limits URL creation to 10 per minute per API key
  • Limits redirects to 100 per minute per IP (prevent abuse)
  • Uses Redis for distributed state
  • Returns proper HTTP 429 response with Retry-After header

7. Horizontal Scaling Plan

In the ScalingPlan section of your submission, describe (as code comments or a design document string):

  • How you would shard the URLs database (which key, which strategy, why)
  • How you would handle cache consistency across multiple API server instances
  • How you would scale the analytics pipeline to handle 10x traffic
  • What your monitoring and alerting strategy would be (key metrics)
  • How you would handle the "thundering herd" problem for viral URLs

Hints

  • For Base62 encoding, think of it like converting a number to base-62 (similar to converting decimal to hexadecimal, but with 62 digits instead of 16)
  • For the database schema, consider that reads vastly outnumber writes — optimize the read path first
  • For analytics, use a message queue (Kafka) to decouple click recording from the redirect path — the redirect must be fast
  • For rate limiting, use a Lua script in Redis to make the check-and-decrement atomic
  • The scaling plan should address the specific bottleneck at each tier: stateless API servers scale trivially, but the database and cache layers require careful sharding

What to Submit

Your submission should contain 1 file section in the editor below: a complete TypeScript file with all 8 sections.


Grading Rubric

API Design: All four endpoints defined with correct HTTP methods, request/response types, proper status codes (201 for creation, 301/302 for redirect, 429 for rate limit, 404 for not found), and input validation15 points
Encoding & Storage: Base62 encode/decode correctly converts between numeric IDs and 7-character strings, PostgreSQL schema has proper indexes on short_code, DynamoDB design uses appropriate partition/sort keys, both schemas handle the URL mapping and metadata20 points
Caching Strategy: Cache-aside pattern correctly implemented (check cache → miss → query DB → populate cache), popularity-based TTL logic works (hot URLs get longer TTL), cache invalidation on delete, cache warming for new URLs20 points
Analytics: Click events recorded non-blocking with event buffering, batch flushing logic present, geo-lookup stub implemented, aggregated stats function returns clicks by day/referrer/country15 points
Scaling Plan: Covers database sharding (specific key and strategy with justification), cache consistency across instances, analytics pipeline scaling approach, monitoring metrics, and thundering herd mitigation15 points
Capacity Estimation: All calculations correct (write QPS ~40, read QPS ~400, peak values at 3x, total storage ~3 TB, cache ~3.5 GB/day), formulas shown, units labeled15 points

Checklist

0/8

Your Solution

3 free attempts remaining