W&B Weave for Evaluation

W&B Weave Introduction

3 min read

W&B Weave is Weights & Biases' framework for building, evaluating, and iterating on LLM applications. It emphasizes evaluation-driven development and seamless experiment tracking.

What is Weave?

Weave provides:

Feature Description
Tracing Automatic logging of LLM calls and chains
Evaluation Built-in evaluation framework with scorers
Versioning Track changes to prompts, models, and data
Visualization Interactive UI for exploring results

Why Weave?

Weave is designed around the principle of evaluation-driven development:

  1. Write evaluations first
  2. Run experiments against evaluations
  3. Iterate based on results
  4. Track improvements over time

Installation

pip install weave openai

Quick Start

Initialize Weave and start tracing:

import weave
from openai import OpenAI

# Initialize Weave with your project
weave.init('my-team/my-llm-project')

client = OpenAI()

# All OpenAI calls are now automatically traced
response = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=[{"role": "user", "content": "Hello, world!"}]
)

The @weave.op() Decorator

Track any function with the @weave.op() decorator:

import weave

weave.init('my-team/my-project')

@weave.op()
def generate_summary(text: str) -> str:
    """Summarize the given text."""
    response = client.chat.completions.create(
        model="gpt-4o-mini",
        messages=[
            {"role": "system", "content": "Summarize the following text concisely."},
            {"role": "user", "content": text}
        ]
    )
    return response.choices[0].message.content

# Every call is tracked with inputs and outputs
summary = generate_summary("Long article text here...")

Trace Structure

Weave creates hierarchical traces:

@weave.op() function call
├── Input: function arguments
├── Output: return value
├── Duration: execution time
├── Cost: token usage (if applicable)
└── Nested calls: child operations

Viewing Traces

Access your traces in the W&B UI:

  1. Go to wandb.ai
  2. Navigate to your project
  3. Select the Weave tab
  4. Browse traces, filter, and analyze

Project Organization

Structure your Weave projects:

my-team/
├── support-bot/          # Production support chatbot
│   ├── traces
│   ├── evaluations
│   └── experiments
├── content-generator/    # Content generation pipeline
│   ├── traces
│   └── evaluations
└── rag-system/          # RAG application
    ├── traces
    └── evaluations

Authentication

Set your W&B API key:

export WANDB_API_KEY=your-api-key

Or login interactively:

wandb login

Key Concepts

Concept Description
Op A tracked function decorated with @weave.op()
Trace A recorded execution of an op with all data
Evaluation A set of test cases with scorers
Model A versioned LLM configuration

Tip: Start by adding @weave.op() to your main LLM functions. You can add more granular tracking later.

Next, we'll learn how to set up evaluation pipelines in Weave. :::

Quiz

Module 5: W&B Weave for Evaluation

Take Quiz