W&B Weave for Evaluation
W&B Weave Introduction
3 min read
W&B Weave is Weights & Biases' framework for building, evaluating, and iterating on LLM applications. It emphasizes evaluation-driven development and seamless experiment tracking.
What is Weave?
Weave provides:
| Feature | Description |
|---|---|
| Tracing | Automatic logging of LLM calls and chains |
| Evaluation | Built-in evaluation framework with scorers |
| Versioning | Track changes to prompts, models, and data |
| Visualization | Interactive UI for exploring results |
Why Weave?
Weave is designed around the principle of evaluation-driven development:
- Write evaluations first
- Run experiments against evaluations
- Iterate based on results
- Track improvements over time
Installation
pip install weave openai
Quick Start
Initialize Weave and start tracing:
import weave
from openai import OpenAI
# Initialize Weave with your project
weave.init('my-team/my-llm-project')
client = OpenAI()
# All OpenAI calls are now automatically traced
response = client.chat.completions.create(
model="gpt-4o-mini",
messages=[{"role": "user", "content": "Hello, world!"}]
)
The @weave.op() Decorator
Track any function with the @weave.op() decorator:
import weave
weave.init('my-team/my-project')
@weave.op()
def generate_summary(text: str) -> str:
"""Summarize the given text."""
response = client.chat.completions.create(
model="gpt-4o-mini",
messages=[
{"role": "system", "content": "Summarize the following text concisely."},
{"role": "user", "content": text}
]
)
return response.choices[0].message.content
# Every call is tracked with inputs and outputs
summary = generate_summary("Long article text here...")
Trace Structure
Weave creates hierarchical traces:
@weave.op() function call
├── Input: function arguments
├── Output: return value
├── Duration: execution time
├── Cost: token usage (if applicable)
└── Nested calls: child operations
Viewing Traces
Access your traces in the W&B UI:
- Go to wandb.ai
- Navigate to your project
- Select the Weave tab
- Browse traces, filter, and analyze
Project Organization
Structure your Weave projects:
my-team/
├── support-bot/ # Production support chatbot
│ ├── traces
│ ├── evaluations
│ └── experiments
├── content-generator/ # Content generation pipeline
│ ├── traces
│ └── evaluations
└── rag-system/ # RAG application
├── traces
└── evaluations
Authentication
Set your W&B API key:
export WANDB_API_KEY=your-api-key
Or login interactively:
wandb login
Key Concepts
| Concept | Description |
|---|---|
| Op | A tracked function decorated with @weave.op() |
| Trace | A recorded execution of an op with all data |
| Evaluation | A set of test cases with scorers |
| Model | A versioned LLM configuration |
Tip: Start by adding
@weave.op()to your main LLM functions. You can add more granular tracking later.
Next, we'll learn how to set up evaluation pipelines in Weave. :::