Capstone: Multi-Agent Content Pipeline

Outcome: By the end of this lesson you will have shipped a running CrewAI pipeline of 4 specialized agents — Researcher → Writer → Fact-Checker → Publisher — that takes a topic as input, gathers sources, drafts a post, verifies every factual claim, and saves the finished markdown (or posts it to your CMS via a webhook). This is the multi-agent orchestration you've been building toward for five modules.

What you'll ship — this command runs the full crew:

python pipeline.py --topic "The state of open-source agent frameworks in Q2 2026"

Output (to stdout and ./output/{slug}.md):

[Researcher] Found 8 sources. Top sources: crewAI GitHub 2026 star history,
  Anthropic SDK 2026 release notes, LangChain v1.0 announcement...
[Writer] Drafted 1840-word post. 4 sections, 12 claims marked for verification.
[Fact-Checker] Verified 10/12 claims. Flagged 2 for revision:
  - "CrewAI has 80K GitHub stars" — actual is ~32K. Corrected.
  - "LangChain v1.0 shipped April 2026" — actual is June 2026. Corrected.
[Publisher] Saved → ./output/state-of-open-source-agent-frameworks.md
  Webhook POST → https://your-cms.com/api/drafts → 201 Created

The architecture you're building

Multi-Agent Content Pipeline — CrewAI Sequential Process

Input

Researcher (Claude Sonnet 4.6)

Writer (Claude Sonnet 4.6)

Fact-Checker (Claude Opus 4.7)

Publisher (Claude Haiku 4.6 — cheap)

Part 1 — Project setup (5 min)

content-crew/
├── pipeline.py              # Orchestrates the 4 agents
├── agents/
│   ├── researcher.py        # Defines the Researcher agent + its tools
│   ├── writer.py
│   ├── fact_checker.py
│   └── publisher.py
├── tools/
│   ├── web_search.py        # Tavily search wrapper
│   └── cms_webhook.py       # POST to your CMS
├── requirements.txt
├── .env.example
└── output/                  # Generated posts land here

requirements.txt:

crewai>=1.14.0
crewai-tools>=1.14.0
anthropic>=0.99.0
tavily-python>=0.7.24
httpx>=0.27.2
python-dotenv>=1.0.1

.env.example:

ANTHROPIC_API_KEY=sk-ant-...
TAVILY_API_KEY=tvly-...            # tavily.com — 1000 free searches/month
CMS_WEBHOOK_URL=https://your-cms.com/api/drafts    # optional; leave empty to skip publish
CMS_WEBHOOK_TOKEN=                  # bearer token for your CMS

Install:

pip install -r requirements.txt
cp .env.example .env    # fill in keys

Part 2 — Custom tools (10 min)

CrewAI agents use the same @tool decorator pattern from Module 3. Two tools power this crew: web search (Researcher) and CMS webhook (Publisher).

tools/web_search.py:

import os
from crewai.tools import tool
from tavily import TavilyClient

tavily = TavilyClient(api_key=os.environ["TAVILY_API_KEY"])


@tool("web_search")
def web_search(query: str, max_results: int = 6) -> str:
    """Search the live web. Returns a numbered list of results with title, URL, and snippet."""
    resp = tavily.search(
        query=query,
        max_results=max_results,
        search_depth="advanced",
        include_answer=True,
        days=90,          # last 3 months only — we want current
    )
    lines = [f"Summary: {resp.get('answer', '')}\n"]
    for i, r in enumerate(resp.get("results", []), 1):
        lines.append(f"[{i}] {r['title']}\n    {r['url']}\n    {r['content'][:300]}")
    return "\n".join(lines)

tools/cms_webhook.py:

import os
import httpx
from crewai.tools import tool


@tool("publish_to_cms")
def publish_to_cms(slug: str, title: str, markdown: str) -> str:
    """
    Publish the finished post to the CMS via webhook.
    Returns the webhook response or a skip message if no URL is configured.
    """
    url = os.environ.get("CMS_WEBHOOK_URL", "").strip()
    if not url:
        return "CMS_WEBHOOK_URL not set — skipping publish (saved to disk only)."

    headers = {"Content-Type": "application/json"}
    token = os.environ.get("CMS_WEBHOOK_TOKEN", "").strip()
    if token:
        headers["Authorization"] = f"Bearer {token}"

    r = httpx.post(url, json={"slug": slug, "title": title, "markdown": markdown}, headers=headers, timeout=30)
    return f"POST {url} → {r.status_code} {r.reason_phrase}"

Part 3 — The four agents (15 min)

Each agent has one job — that's the whole point of orchestration. An agent with three responsibilities quickly turns into an agent that does none of them well.

agents/researcher.py:

from crewai import Agent
from tools.web_search import web_search


researcher = Agent(
    role="Research Analyst",
    goal="Gather 6-10 authoritative, recent sources on the assigned topic",
    backstory=(
        "You are a veteran research analyst. You trust primary sources (vendor docs, "
        "GitHub repos, official announcements) over secondary summaries. You are "
        "suspicious of round numbers and recent dates — both are red flags for "
        "fabrication you flag for the fact-checker to verify."
    ),
    tools=[web_search],
    llm="anthropic/claude-sonnet-4-6",
    verbose=True,
    allow_delegation=False,
)

agents/writer.py:

from crewai import Agent


writer = Agent(
    role="Technical Writer",
    goal="Draft a 1500-2000 word post that is accurate, scannable, and cites every numerical claim",
    backstory=(
        "You write for developers. You never pad, never use buzzwords, never say "
        "'revolutionary'. You structure posts as: TL;DR (3 bullets) → one H2 per main "
        "idea → a concluding 'what to watch'. For every statistic, version number, "
        "date, or comparison, you include an inline [^N] citation that the "
        "fact-checker will verify. You draft the footnote block at the bottom "
        "referencing the Researcher's sources."
    ),
    llm="anthropic/claude-sonnet-4-6",
    verbose=True,
    allow_delegation=False,
)

agents/fact_checker.py:

from crewai import Agent
from tools.web_search import web_search


fact_checker = Agent(
    role="Fact Checker",
    goal=(
        "Verify every numerical claim, version number, date, and 'first/only/largest' "
        "superlative in the draft against live sources. Correct or flag anything wrong."
    ),
    backstory=(
        "You are paranoid by trade. You assume every stat is wrong until you've found "
        "an authoritative source confirming it. You treat negative claims ('X doesn't "
        "have Y', 'X has no API') with extra suspicion — these are the single most "
        "common source of errors, because absence of evidence isn't evidence of absence. "
        "You correct inline by editing the draft, preserving the rest of the writer's voice."
    ),
    tools=[web_search],
    llm="anthropic/claude-opus-4-7",    # larger model for nuanced verification
    verbose=True,
    allow_delegation=False,
)

agents/publisher.py:

from crewai import Agent
from tools.cms_webhook import publish_to_cms


publisher = Agent(
    role="Publisher",
    goal="Save the final post to disk and push to the CMS webhook",
    backstory=(
        "You are the last pair of eyes. You add frontmatter YAML (title, slug, date, tags), "
        "save the post to ./output/{slug}.md, and POST it to the CMS webhook. "
        "You do not edit content — that's the writer's and fact-checker's job."
    ),
    tools=[publish_to_cms],
    llm="anthropic/claude-haiku-4-6",   # cheap model for mechanical work
    verbose=True,
    allow_delegation=False,
)

The model choice is deliberate:

Researcher: Sonnet — broad reasoning + tool use
Writer: Sonnet — quality drafting, style control
Fact-checker: Opus — highest accuracy, worth the cost for correctness
Publisher: Haiku — tiny mechanical task, ~5-6× cheaper

Matching model power to task difficulty cuts your crew's total cost by ~60% vs. using Sonnet for every agent (pattern from Module 5 Lesson 1).

Part 4 — The orchestrating crew (10 min)

pipeline.py:

import argparse
import re
from pathlib import Path
from dotenv import load_dotenv
from crewai import Crew, Task, Process

from agents.researcher import researcher
from agents.writer import writer
from agents.fact_checker import fact_checker
from agents.publisher import publisher

load_dotenv()
Path("output").mkdir(exist_ok=True)


def slugify(title: str) -> str:
    s = re.sub(r"[^\w\s-]", "", title.lower())
    return re.sub(r"[\s_-]+", "-", s).strip("-")[:80]


def build_crew(topic: str) -> Crew:
    research_task = Task(
        description=(
            f"Research the topic: '{topic}'.\n\n"
            f"Deliverable: a structured brief containing:\n"
            f"1. 3-5 key findings (one sentence each)\n"
            f"2. 6-10 source citations in format [N] Title — URL\n"
            f"3. Any claim you flag as 'potentially unreliable' (round numbers, recent dates, superlatives)\n\n"
            f"Prioritize sources from the last 90 days. Avoid aggregator blogs — go to primary sources."
        ),
        expected_output="A research brief with findings, cited sources, and flagged claims.",
        agent=researcher,
    )

    write_task = Task(
        description=(
            "Using the research brief from the Researcher, draft a 1500-2000 word blog post.\n\n"
            "Structure:\n"
            "- TL;DR: 3 bullets\n"
            "- 3-5 H2 sections, each ~300 words\n"
            "- Concluding 'What to watch' section\n\n"
            "For every numerical claim or date, include a [^N] inline citation and list "
            "the matching footnote at the bottom using the Researcher's source URLs. "
            "Do NOT invent claims not supported by the research brief."
        ),
        expected_output="A complete markdown blog post with inline citations and a footnote block.",
        agent=writer,
        context=[research_task],
    )

    factcheck_task = Task(
        description=(
            "Review the draft from the Writer. For EVERY [^N] citation:\n"
            "1. Re-search the cited fact using web_search.\n"
            "2. Compare what the draft says against the sources.\n"
            "3. If wrong, edit the draft inline to fix it. Preserve the rest of the writer's voice.\n"
            "4. Flag any claim you couldn't verify — rewrite it with hedged language like "
            "   'as of [date]' or 'reportedly' rather than stating it as fact.\n\n"
            "Also scan for unchecked negative claims ('X has no Y') — these are the #1 "
            "source of errors. Search each one independently before accepting it.\n\n"
            "Output: the corrected markdown, plus a change log listing which claims "
            "you fixed and which you hedged."
        ),
        expected_output="Corrected markdown draft + change log of edits.",
        agent=fact_checker,
        context=[research_task, write_task],
    )

    publish_task = Task(
        description=(
            "Take the fact-checker's corrected markdown and:\n"
            "1. Prepend frontmatter YAML with: title (infer from H1), slug, date (today), "
            "   tags (3-5 inferred from content), featured: true.\n"
            "2. Use the `publish_to_cms` tool with (slug, title, full_markdown).\n"
            "3. Report the final status (CMS response or skip message).\n\n"
            "Do NOT modify the body. Only add frontmatter."
        ),
        expected_output="Final markdown with frontmatter + webhook status.",
        agent=publisher,
        context=[factcheck_task],
    )

    return Crew(
        agents=[researcher, writer, fact_checker, publisher],
        tasks=[research_task, write_task, factcheck_task, publish_task],
        process=Process.sequential,
        verbose=True,
    )


def save_output(topic: str, final_markdown: str):
    slug = slugify(topic)
    path = Path("output") / f"{slug}.md"
    path.write_text(final_markdown, encoding="utf-8")
    print(f"\n✓ Saved → {path}")
    return path


if __name__ == "__main__":
    parser = argparse.ArgumentParser()
    parser.add_argument("--topic", required=True, help="Blog post topic (1-2 sentences)")
    args = parser.parse_args()

    crew = build_crew(args.topic)
    result = crew.kickoff()

    # CrewAI's result.raw holds the final Publisher output, which is the framed markdown
    save_output(args.topic, str(result.raw))

Part 5 — Run it (2 min)

python pipeline.py --topic "The state of open-source agent frameworks in Q2 2026"

You should see the 4 agents run sequentially:

[Researcher] Working on research task...
  → web_search called (3 times)
  → brief with 8 sources returned

[Writer] Working on draft task...
  → 1840-word draft returned

[Fact-Checker] Working on verification task...
  → web_search called (12 times)
  → 2 corrections, 0 hedged

[Publisher] Working on publish task...
  → publish_to_cms called
  → Saved → output/state-of-open-source-agent-frameworks.md
  → POST https://your-cms.com/... → 201 Created

Total runtime: 3–6 minutes depending on web search latency. Total cost on Anthropic: $0.40–$0.80 per post at 2026 prices.

Part 6 — What changes when you swap the topic

The pipeline is topic-agnostic. To adapt for a different editorial beat:

Change	Where
Different research focus	Agent `backstory` + `description` on research_task
Different post length	Writer task description (1500-2000 → 800-1200)
Different CMS (WordPress, Ghost, Notion, Obsidian)	Swap `publish_to_cms` implementation in `tools/cms_webhook.py`
Add an SEO agent before Publisher	One new `Agent(...)` + one new `Task(...)` with `context=[factcheck_task]`
Make it a scheduled job	Wrap `pipeline.py` in a GitHub Actions cron (`0 9 * * 1-5`)

Adding a 5th agent is ~15 lines of code — that's the power of CrewAI's structure.

Part 7 — Troubleshooting matrix

Symptom	First check	Typical cause
Researcher returns 0 results	Tavily dashboard	Free-tier exhausted, or `days=90` too restrictive
Writer output is very short (<500 words)	Writer backstory	Agent "goal" phrased as "summary" rather than "1500-2000 word post"
Fact-checker doesn't actually correct the draft	CrewAI version + model_config	Older CrewAI versions don't pass prior task outputs as editable context — upgrade to 1.x+ (current stable: 1.14.4)
Publisher fails webhook	`CMS_WEBHOOK_URL`	Endpoint wrong, auth header missing, or endpoint doesn't accept the shape
All agents use the same model even though you set different `llm` on each	env override	`CREWAI_DEFAULT_MODEL` env var trumps per-agent config — unset it
Crew hangs forever	Task descriptions	Ambiguous "expected_output" keeps agents iterating — tighten it

Build checkpoint — finish this before claiming the certificate

Ship the crew. python pipeline.py --topic "anything" runs through all 4 agents without errors.
Ship the write path. The final markdown in ./output/ has correct frontmatter and real content.
Ship a catch. Run on a topic where the Writer is likely to hallucinate stats (e.g. "Recent GitHub star counts for agent frameworks"). Confirm the Fact-Checker's change log shows at least 1 correction.
Ship the webhook. Point CMS_WEBHOOK_URL at a real endpoint (or httpbin.org/post for testing). Confirm the Publisher POSTs and the response logs.
Screenshot the crew running + the final markdown + the webhook response — as your proof of work.

You've just shipped a multi-agent pipeline that does real editorial work on a topic-agnostic basis. Every orchestration pattern from the last five modules — agent specialization, task sequencing, tool routing, model-cost matching, fact-checking guardrails — is now running in production.

Next (optional): add a Slack integration so the crew drops a daily digest into your team channel, or hook it to an RSS feed so new tech announcements get turned into drafts automatically. :::

Multi-Agent Content Pipeline — CrewAI Sequential Process

Quiz