llm-integration

Test Vercel AI SDK Code with MockLanguageModelV3 (2026)

٢٠ يونيو ٢٠٢٦

Test Vercel AI SDK Code with MockLanguageModelV3 (2026)

To test Vercel AI SDK code without calling a real model, inject the model as a parameter and pass MockLanguageModelV3 from ai/test in your tests. It returns deterministic responses for generateText and streamText, so your suite runs in milliseconds with no API key and no cost.

TL;DR

This hands-on guide adds a real unit-test suite to a TypeScript app built on the Vercel AI SDK. You will scaffold a project with Vitest 4.1.91, make three small functions testable by injecting the model, then mock structured output, multi-step tool calling, and streaming using MockLanguageModelV3 from ai/test on ai 6.0.2082. Every test here was executed on 20 June 2026 against the real library: 3 files, 6 tests, all passing, with tsc --strict clean and no API key. Budget about 30 minutes.

What you'll learn

  • Why AI SDK code is hard to test, and the one pattern that fixes it
  • How to scaffold a TypeScript + Vitest project for the AI SDK
  • How to make your code testable by injecting the model
  • How to mock generateText and assert on a structured output
  • How to test the error path when the model returns garbage
  • How to assert what prompt your code actually sent to the model
  • How to test multi-step tool calling with mockValues
  • How to test streaming responses with simulateReadableStream
  • How to share mocks with a small, typed test kit

Why AI SDK code is hard to test

Language models are non-deterministic, slow, and cost money on every call, so the usual instinct — call the model in a test and assert on the answer — fails on all three counts. Your test would be flaky (the wording changes), slow (a network round-trip), and expensive (tokens per run). The AI SDK solves this with mock providers you can swap in, so you test your code — prompt assembly, schema parsing, tool wiring, stream handling — without ever reaching a provider.3

There is one catch that trips up a lot of older tutorials online: the mock class was renamed. In AI SDK v6, the V2 mock classes were removed from ai/test; you now use MockLanguageModelV3.4 If you copy an older snippet that imports MockLanguageModelV2, it will not exist on ai 6.x. This guide pins v6 and uses the current V3 helpers throughout.

Prerequisites

  • Node.js 18+ (the ai package's engines field allows Node 18 or newer; this guide was verified on Node 22.22.3)
  • npm 10+
  • Familiarity with async/await and basic TypeScript
  • No API key and no provider account — that is the entire point

Step 1: Scaffold the project

Create an empty project and install the AI SDK plus a test runner. Pin the versions so your results match this guide.

mkdir ai-sdk-testing && cd ai-sdk-testing
npm init -y
npm pkg set type=module
npm install [email protected] @ai-sdk/[email protected] [email protected]
npm install -D [email protected] [email protected] @types/[email protected]

Add a strict tsconfig.json. The strict flags matter here: they catch mistakes in your mock shapes before you ever run the tests.

{
  "compilerOptions": {
    "target": "ES2023",
    "module": "NodeNext",
    "moduleResolution": "NodeNext",
    "strict": true,
    "noUncheckedIndexedAccess": true,
    "verbatimModuleSyntax": true,
    "skipLibCheck": true,
    "types": ["node"],
    "noEmit": true
  },
  "include": ["src"]
}

Add a minimal Vitest config and two npm scripts:

// vitest.config.ts
import { defineConfig } from 'vitest/config';

export default defineConfig({ test: { environment: 'node', include: ['src/**/*.test.ts'] } });
npm pkg set scripts.test="vitest run"
npm pkg set scripts.typecheck="tsc --noEmit"

You now have a project that type-checks under strict mode and runs tests with npm test. Nothing here contacts a model provider.

Step 2: Make your code testable — inject the model

The key habit to adopt is to never hard-code the model inside the function you want to test. Instead, accept it as a parameter typed LanguageModel. Production passes a real provider model; tests pass a mock. Here is the first function — a support-ticket triage helper that asks the model for a typed object using generateText with Output.object.5

// src/triage.ts
import { generateText, Output, type LanguageModel } from 'ai';
import { z } from 'zod';

export const ticketSchema = z.object({
  category: z.enum(['billing', 'bug', 'feature', 'other']),
  priority: z.enum(['low', 'medium', 'high']),
  summary: z.string(),
});
export type Ticket = z.infer<typeof ticketSchema>;

export async function triageTicket(model: LanguageModel, message: string): Promise<Ticket> {
  const { output } = await generateText({
    model,
    system: 'You are a support triage assistant. Classify the ticket.',
    prompt: message,
    output: Output.object({ schema: ticketSchema }),
  });
  return output;
}

In production you wire the real model in exactly one place, so the rest of your code stays provider-agnostic and testable:

// src/models.ts
import { openai } from '@ai-sdk/openai';
import type { LanguageModel } from 'ai';

// The ONLY place a real provider is referenced. Swap in whatever chat model
// your provider currently exposes — this line is never reached in tests.
export const productionModel: LanguageModel = openai('gpt-4o-mini');

That is the whole trick. Because triageTicket takes model, a test can hand it a fake.

Step 3: Mock generateText and test a structured output

Create src/triage.test.ts. The mock is a MockLanguageModelV3 whose doGenerate returns the response you want the model to "produce." Notice the response shape: content is an array of parts, finishReason is an object { unified, raw }, and usage carries nested token objects. Under strict TypeScript you must include every token field — cacheRead, cacheWrite, and reasoning — even when they are undefined, or the compiler rejects the mock.

// src/triage.test.ts
import { describe, it, expect } from 'vitest';
import { NoObjectGeneratedError } from 'ai';
import { MockLanguageModelV3 } from 'ai/test';
import type { LanguageModelV3Prompt } from '@ai-sdk/provider';
import { triageTicket } from './triage.js';

describe('triageTicket', () => {
  it('parses a structured ticket from the model output', async () => {
    const model = new MockLanguageModelV3({
      doGenerate: async () => ({
        content: [{ type: 'text', text: '{"category":"billing","priority":"high","summary":"Double charged"}' }],
        finishReason: { unified: 'stop', raw: undefined },
        usage: {
          inputTokens: { total: 10, noCache: 10, cacheRead: undefined, cacheWrite: undefined },
          outputTokens: { total: 20, text: 20, reasoning: undefined },
        },
        warnings: [],
      }),
    });
    const ticket = await triageTicket(model, 'I was charged twice this month!');
    expect(ticket).toEqual({ category: 'billing', priority: 'high', summary: 'Double charged' });
  });
});

Run npm test and this passes. The model "returned" a JSON string, the SDK parsed it against your Zod schema, and your function handed back a fully typed Ticket — all offline.

Test the error path

A good test suite proves the unhappy path too. When the model returns text that is not valid JSON, generateText with Output.object throws NoObjectGeneratedError (its error name is AI_NoObjectGeneratedError). Add a test that mocks a non-JSON reply and asserts the rejection. Append it inside the same describe block:

  it('rejects when the model returns text that is not valid JSON', async () => {
    const model = new MockLanguageModelV3({
      doGenerate: async () => ({
        content: [{ type: 'text', text: 'sorry, I cannot help with that' }],
        finishReason: { unified: 'stop', raw: undefined },
        usage: {
          inputTokens: { total: 1, noCache: 1, cacheRead: undefined, cacheWrite: undefined },
          outputTokens: { total: 1, text: 1, reasoning: undefined },
        },
        warnings: [],
      }),
    });
    await expect(triageTicket(model, 'gibberish')).rejects.toBeInstanceOf(NoObjectGeneratedError);
  });

Now you have proof your code surfaces a typed error instead of returning a broken object — and you verified it without spending a token to provoke a real model into misbehaving.

Step 4: Assert what your code sent to the model

Tests should also check the input side: did your function actually send the system prompt and the user's message? The doGenerate callback receives the call options, including the assembled prompt. Capture it and assert. Add this test to the describe block:

  it('sends the system prompt and user message to the model', async () => {
    let captured: LanguageModelV3Prompt | undefined;
    const model = new MockLanguageModelV3({
      doGenerate: async ({ prompt }) => {
        captured = prompt;
        return {
          content: [{ type: 'text', text: '{"category":"bug","priority":"low","summary":"x"}' }],
          finishReason: { unified: 'stop', raw: undefined },
          usage: {
            inputTokens: { total: 1, noCache: 1, cacheRead: undefined, cacheWrite: undefined },
            outputTokens: { total: 1, text: 1, reasoning: undefined },
          },
          warnings: [],
        };
      },
    });
    await triageTicket(model, 'The export button is broken');
    expect(captured?.[0]).toMatchObject({ role: 'system' });
    expect(JSON.stringify(captured)).toContain('The export button is broken');
  });

This is where mocks earn their keep on real applications: if you inject retrieved context for RAG, or build a system prompt from user settings, this pattern proves the right text reached the model — something you cannot reliably assert when the model itself is in the loop.

Step 5: Test tool calling and multi-step agents

Tool-calling agents are among the trickiest things to test because they involve multiple model turns: the model asks to call a tool, your code runs the tool, then the model produces a final answer. Here is a small weather agent that loops with stepCountIs.6

// src/agent.ts
import { generateText, tool, stepCountIs, type LanguageModel } from 'ai';
import { z } from 'zod';

export const weatherTool = tool({
  description: 'Get the current temperature for a city in Celsius.',
  inputSchema: z.object({ city: z.string() }),
  execute: async ({ city }) => ({ city, tempC: 21 }),
});

export interface AgentReply {
  text: string;
  toolsUsed: string[];
}

export async function askWeatherAgent(model: LanguageModel, question: string): Promise<AgentReply> {
  const result = await generateText({
    model,
    tools: { weather: weatherTool },
    stopWhen: stepCountIs(5),
    prompt: question,
  });
  return {
    text: result.text,
    toolsUsed: result.steps.flatMap((s) => s.toolCalls).map((c) => c.toolName),
  };
}

To drive two model turns from one mock, use mockValues, which returns each value in order. Pass it your result objects (the generateResult helper from Step 7 builds them) — the first turn is a tool call, the second is the final text:

// src/agent.test.ts
import { describe, it, expect } from 'vitest';
import { MockLanguageModelV3, mockValues } from 'ai/test';
import { askWeatherAgent } from './agent.js';
import { generateResult } from './testkit.js';

describe('askWeatherAgent', () => {
  it('runs the weather tool then returns a final answer', async () => {
    const model = new MockLanguageModelV3({
      doGenerate: mockValues(
        generateResult([
          { type: 'tool-call', toolCallId: 'c1', toolName: 'weather', input: JSON.stringify({ city: 'Cairo' }) },
        ]),
        generateResult([{ type: 'text', text: 'It is 21C in Cairo.' }]),
      ),
    });
    const reply = await askWeatherAgent(model, 'Weather in Cairo?');
    expect(reply.text).toBe('It is 21C in Cairo.');
    expect(reply.toolsUsed).toEqual(['weather']);
  });

  it('answers directly when no tool is needed', async () => {
    const model = new MockLanguageModelV3({
      doGenerate: async () => generateResult([{ type: 'text', text: 'Ask me about the weather!' }]),
    });
    const reply = await askWeatherAgent(model, 'Hello');
    expect(reply.toolsUsed).toEqual([]);
    expect(reply.text).toBe('Ask me about the weather!');
  });
});

There is a subtle gotcha worth memorizing. After a multi-step run, result.toolCalls and result.toolResults reflect only the final step — which is the text turn — so they are empty. The tool calls live in the earlier step. That is exactly why askWeatherAgent reads result.steps.flatMap((s) => s.toolCalls) instead of result.toolCalls. If your assertion on tool usage keeps coming back empty, this is almost always the cause.

Step 6: Test streaming responses

Streaming cannot be faked with a plain resolved value, because streamText returns an async iterable. The SDK ships simulateReadableStream for this: you provide the stream chunks and it replays them. Here is a streaming summarizer that yields deltas to its caller.7

// src/summarize.ts
import { streamText, type LanguageModel } from 'ai';

export async function* streamSummary(model: LanguageModel, text: string): AsyncGenerator<string> {
  const result = streamText({
    model,
    system: 'Summarize the text in one sentence.',
    prompt: text,
  });
  for await (const delta of result.textStream) {
    yield delta;
  }
}

The test mocks doStream with a sequence of text-start, text-delta, text-end, and finish chunks, then collects the yielded pieces and reassembles them:

// src/summarize.test.ts
import { describe, it, expect } from 'vitest';
import { streamSummary } from './summarize.js';
import { mockStream } from './testkit.js';

describe('streamSummary', () => {
  it('yields text deltas and reassembles the full summary', async () => {
    const model = mockStream(['A ', 'short ', 'summary.']);
    const chunks: string[] = [];
    for await (const delta of streamSummary(model, 'long input text')) chunks.push(delta);
    expect(chunks.length).toBeGreaterThan(1);
    expect(chunks.join('')).toBe('A short summary.');
  });
});

Step 7: Share mocks with a test kit

The full mock response is verbose, so centralize it. This testkit.ts defines one complete usage object (satisfying the strict types in a single place), a generateResult builder used by mockValues, and mockStream for the streaming case. Making generateResult async lets you pass it straight into mockValues while keeping the types happy.

// src/testkit.ts
import { MockLanguageModelV3 } from 'ai/test';
import { simulateReadableStream } from 'ai';
import type {
  LanguageModelV3CallOptions,
  LanguageModelV3Content,
  LanguageModelV3Usage,
} from '@ai-sdk/provider';

// A complete usage object. v6's strict types require every token field,
// so we set the ones we don't care about to undefined in one place.
const usage: LanguageModelV3Usage = {
  inputTokens: { total: 10, noCache: 10, cacheRead: undefined, cacheWrite: undefined },
  outputTokens: { total: 20, text: 20, reasoning: undefined },
};

// One non-streaming response (text or tool calls).
// async so it can be passed straight to mockValues() for multi-step runs.
export async function generateResult(content: LanguageModelV3Content[]) {
  return {
    content,
    finishReason: { unified: 'stop' as const, raw: undefined },
    usage,
    warnings: [],
  };
}

// A mock that returns a single text response and captures the prompt it saw.
export function mockText(text: string, onCall?: (opts: LanguageModelV3CallOptions) => void) {
  return new MockLanguageModelV3({
    doGenerate: async (options) => {
      onCall?.(options);
      return generateResult([{ type: 'text', text }]);
    },
  });
}

// A mock that streams the given text as several deltas.
export function mockStream(deltas: string[]) {
  return new MockLanguageModelV3({
    doStream: async () => ({
      stream: simulateReadableStream({
        chunks: [
          { type: 'text-start', id: 't1' },
          ...deltas.map((delta) => ({ type: 'text-delta' as const, id: 't1', delta })),
          { type: 'text-end', id: 't1' },
          { type: 'finish', finishReason: { unified: 'stop' as const, raw: undefined }, usage },
        ],
      }),
    }),
  });
}

With the kit in place, new tests stay short: mockText('...') for a one-shot answer, mockStream([...]) for streaming, and generateResult([...]) inside mockValues(...) for multi-step agents.

Verification

Type-check and run the suite. With no API key set and no network access to any provider, everything passes:

npm run typecheck   # tsc --noEmit, strict
npm test            # vitest run

Expected output:

 ✓ src/agent.test.ts (2 tests)
 ✓ src/summarize.test.ts (1 test)
 ✓ src/triage.test.ts (3 tests)

 Test Files  3 passed (3)
      Tests  6 passed (6)

Because these tests need no credentials and finish in well under a second, drop npm run typecheck && npm test straight into the same CI job that runs your other unit tests. There is nothing to gate on a secret and nothing to bill, so your AI features get the same pull-request safety net as the rest of the codebase.

Troubleshooting

TS2724: '"ai/test"' has no exported member named 'MockLanguageModelV2'. Did you mean 'MockLanguageModelV3'? The V2 mock classes were removed in AI SDK v6. Import MockLanguageModelV3 instead — the rest of the call shape changed too, so update the response object as shown in Step 3.4

TS error: properties cacheRead, cacheWrite are missing. The strict LanguageModelV3Usage type wants every token field. Include cacheRead: undefined and cacheWrite: undefined on inputTokens, and reasoning: undefined on outputTokens, or reuse the single usage constant from the test kit.

Cannot read properties of undefined (reading 'inputTokens') from mockValues. You passed async () => ({...}) functions to mockValues. It returns each value as-is, so the SDK receives a function instead of a result. Pass result objects (or the async generateResult builder), not arrow functions that return them.

result.toolCalls is empty after a tool ran. Those fields hold the final step only, which is the text turn. Read tool calls from the steps: result.steps.flatMap((s) => s.toolCalls).

A streaming test is slow, or finishReason/usage assertions fail. simulateReadableStream delays default to 0, so leave initialDelayInMs/chunkDelayInMs unset in unit tests — non-zero values (often copied from a streaming-UI demo) add up across chunks and can blow past Vitest's timeout. And if you assert on finishReason or usage, include a finish chunk: without one the stream still completes, but finishReason comes back as 'other' and the usage totals are unavailable.

Next steps and further reading

Mock-based unit tests prove your code behaves correctly given a known model response. They do not measure whether your prompts produce good answers from a real model — that is the job of evals, which you can wire into CI as covered in our guide to testing LLM prompts with promptfoo. Use both: unit tests for logic, evals for quality. If you are new to the SDK itself, start with the broader Vercel AI SDK in practice walkthrough, and for general test-design habits the unit testing strategies guide pairs well with everything above.

From here, extend the suite: assert on result.usage.totalTokens to guard cost regressions, test a tool's execute function in isolation, or use mockValues to simulate a model that retries after a malformed call. Each new test costs nothing to run and catches a real class of bug before it reaches a user.

Footnotes

  1. Vitest on npm (4.1.9). https://www.npmjs.com/package/vitest

  2. ai on npm — 6.0.208 is the latest dist-tag as of 20 June 2026. https://www.npmjs.com/package/ai

  3. Vercel — AI SDK Core: Testing (mock providers and helpers: MockLanguageModelV3, mockId, mockValues, simulateReadableStream). https://ai-sdk.dev/docs/ai-sdk-core/testing

  4. Vercel — Migrate AI SDK 5.x to 6.0 (V2 mock classes removed from ai/test; use the V3 classes). https://ai-sdk.dev/docs/migration-guides/migration-guide-6-0 2

  5. Vercel — AI SDK Core: Generating structured data with Output.object. https://ai-sdk.dev/docs/ai-sdk-core/generating-structured-data

  6. Vercel — AI SDK Core: Tool Calling and multi-step loops with stopWhen/stepCountIs. https://ai-sdk.dev/docs/ai-sdk-core/tools-and-tool-calling

  7. Vercel — AI SDK Core: streamText reference. https://ai-sdk.dev/docs/reference/ai-sdk-core/stream-text