Large Codebase Management with AI

Context Window Strategies

5 min read

The Context Challenge

Large codebases present a fundamental challenge for AI assistants:

  • Claude's Context: ~200K tokens (Opus 4.5)
  • Typical Large Codebase: 500K - 5M+ lines
  • Lines to Tokens Ratio: ~1 line ≈ 3-5 tokens

A 500K line codebase = 1.5M - 2.5M tokens. You can't fit it all in context.

Strategic Context Loading

The Layered Context Approach

Layer 1: Project Overview (always loaded)
├── Architecture docs
├── Directory structure
└── Key interfaces

Layer 2: Domain Context (task-specific)
├── Related modules
├── Type definitions
└── Test examples

Layer 3: Implementation Details (on-demand)
├── Specific files being modified
├── Dependencies of those files
└── Recent changes

Implementing Context Layers

Layer 1: Create a Project Overview

<!-- .claude/CLAUDE.md -->

# Project Overview

## Architecture
- Next.js 15 App Router frontend
- Node.js/Express API backend
- PostgreSQL + Prisma ORM
- Redis for caching
- S3 for file storage

## Key Directories
- `/src/app` - Next.js pages and routes
- `/src/components` - React components (Atomic Design)
- `/src/lib` - Shared utilities
- `/api/src/routes` - API endpoints
- `/api/src/services` - Business logic
- `/packages/shared` - Shared types and utils

## Core Patterns
- Repository pattern for data access
- Service layer for business logic
- React Query for server state
- Zod for validation

## Entry Points
- Frontend: `src/app/page.tsx`
- API: `api/src/index.ts`
- Workers: `workers/src/index.ts`

Layer 2: Domain-Specific Context

claude "I need to work on the payment system.
Load context for:
1. Payment-related types
2. Stripe integration
3. Order processing flow
4. Related tests"

Claude Code reads strategically:

Loading payment domain context:
- packages/types/src/payment.ts (types)
- api/src/services/payment.ts (service)
- api/src/services/stripe.ts (integration)
- api/src/routes/checkout.ts (endpoint)
- tests/payment.test.ts (examples)

Total: 1,200 lines (~5K tokens)
Context remaining: 195K tokens

Context Optimization Techniques

1. Interface-First Loading

Load interfaces and types before implementations:

claude "Read only the exported interfaces from
src/services/. Don't read the implementations yet."
// AI extracts just the shapes
interface PaymentService {
  createIntent(amount: number, currency: string): Promise<PaymentIntent>;
  confirmPayment(intentId: string): Promise<PaymentResult>;
  refund(paymentId: string, amount?: number): Promise<RefundResult>;
}

interface OrderService {
  create(items: CartItem[], userId: string): Promise<Order>;
  fulfill(orderId: string): Promise<void>;
  cancel(orderId: string, reason: string): Promise<void>;
}

2. Selective File Reading

claude "Read src/services/payment.ts but only the
createIntent and confirmPayment methods, skip the
rest of the file."

3. Summary-Based Context

claude "Summarize what each file in src/services/ does
in one sentence. I'll ask for details on specific ones."

AI provides condensed overview:

src/services/ summary:
- auth.ts: User authentication with JWT tokens and OAuth
- payment.ts: Stripe payment processing and subscription management
- order.ts: Order lifecycle from creation to fulfillment
- notification.ts: Email and push notification dispatching
- inventory.ts: Stock management and reservation system
- analytics.ts: Event tracking and usage metrics

4. Diff-Based Context

For changes, load only what's different:

claude "What changed in src/services/ in the last week?
Read only the modified sections."
# AI runs
git diff --stat HEAD~7 -- src/services/

# Then selectively reads changed portions
git diff HEAD~7 -- src/services/payment.ts

Context Checkpointing

Creating Resumable Sessions

claude "We're working on the payment refactor.
Create a checkpoint file with:
1. What we've analyzed so far
2. Files we've modified
3. Next steps
4. Key decisions made"
<!-- .claude/checkpoints/payment-refactor.md -->

# Payment Refactor Checkpoint

## Date: 2026-01-06
## Session: 3

### Analyzed
- [x] Current payment flow (3 touch points)
- [x] Stripe webhook handling
- [x] Error recovery mechanisms

### Modified
- api/src/services/payment.ts (added retry logic)
- api/src/routes/webhooks.ts (improved idempotency)

### Decisions
1. Use exponential backoff for retries (max 5 attempts)
2. Store failed payments in dead letter queue
3. Add payment state machine for tracking

### Next Steps
1. Implement dead letter queue processor
2. Add monitoring for payment failures
3. Update tests for new retry behavior

Resuming Sessions

claude "Read the payment-refactor checkpoint and
continue where we left off."

Token Budget Management

Monitor Your Usage

claude "Estimate how many tokens are in my current context.
List the largest files and suggest what to unload."
Current context estimate: 45,000 tokens

Largest items:
1. api/src/services/payment.ts - 2,100 lines (~8K tokens)
2. packages/types/src/index.ts - 1,500 lines (~6K tokens)
3. Conversation history - (~12K tokens)

Recommendations:
- Unload packages/types, load only payment-related types
- Consider summarizing payment.ts, load only active methods
- Clear completed task context from conversation

Unloading Context

claude "Forget about the user authentication flow,
we're done with that. Focus only on payments now."

Best Practices for Large Codebases

  1. Start Narrow: Begin with interfaces, expand to implementations
  2. Use Summaries: Get overviews before diving deep
  3. Checkpoint Often: Save progress for resumable sessions
  4. Unload Actively: Clear context when switching domains
  5. Trust the AI: Let it decide what to load based on the task

Coming Up

Next, we'll learn codebase navigation patterns—how to efficiently find and understand code in massive projects. :::

Quiz

Module 3: Large Codebase Management with AI

Take Quiz
FREE WEEKLY NEWSLETTER

Stay on the Nerd Track

One email per week — courses, deep dives, tools, and AI experiments.

No spam. Unsubscribe anytime.