Capstone: Automated Job-Form Filler

Outcome: By the end of this lesson you will have a working Claude computer-use agent that reads a job description, opens an application form in a real browser (sandboxed), fills every field with content tailored to that JD, attaches your resume PDF — and stops at the submit button for your review. No surprise submissions, no hallucinated employment history, no sketchy auto-applies.

This capstone combines everything from the last five modules: the agent loop (M1), Docker sandbox (M2), desktop automation (M3), browser automation (M4), and safety guardrails (M5).

What you'll ship — this command runs the agent end-to-end:

python apply.py \
  --jd ./jds/anthropic-applied-ai.txt \
  --form https://boards.greenhouse.io/anthropic/jobs/4087737 \
  --resume ./me.pdf \
  --profile ./profile.yaml

Output: Chrome opens (inside your Docker container), agent fills the form field by field citing which resume/profile detail it used for each, uploads the PDF, and exits with the browser paused on the review screen. You look at the form, make edits if needed, and hit submit yourself.

The architecture you're building

Job-Form Filler — Agent Pipeline

Input

Planner (pre-flight, one LLM call)

Agent loop (Dockerized)

Sandbox (Docker + Xvfb)

Output

Part 1 — Project setup (5 min)

job-form-filler/
├── apply.py                 # Main agent loop
├── tools.py                 # Screenshot, click, type, scroll, upload
├── planner.py               # JD → field-map planner
├── profile.yaml             # Your standing answers (name, email, links)
├── jds/
│   └── anthropic-applied-ai.txt
├── me.pdf
├── requirements.txt
├── Dockerfile
└── .env.example

requirements.txt:

anthropic==0.42.0
pyyaml==6.0.2
python-dotenv==1.0.1
pillow==11.1.0

.env.example:

ANTHROPIC_API_KEY=sk-ant-...

profile.yaml — your standing answers so the agent doesn't hallucinate them:

name: "Your Full Name"
email: "you@example.com"
phone: "+1 555 0100"
linkedin: "https://linkedin.com/in/yourname"
github: "https://github.com/yourname"
website: "https://yourdomain.com"
location: "Cairo, Egypt (Remote)"
work_authorization: "Requires sponsorship in the US; EU citizen"
years_experience: 6
pronouns: "he/him"
salary_expectations: "$160K–$200K base"
# For "why us / why this role" style questions the agent drafts per-JD
writing_voice: "Direct, technical, no buzzwords. Lead with a concrete project."

Part 2 — Docker sandbox (5 min)

Module 2 Lesson 3 built this already. Quick reminder:

Dockerfile:

FROM python:3.12-slim

RUN apt-get update && apt-get install -y \
    xvfb x11vnc chromium chromium-driver xdotool scrot \
    && rm -rf /var/lib/apt/lists/*

WORKDIR /app
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
COPY . .

# Xvfb = virtual framebuffer — screenshots work without a real display
ENV DISPLAY=:99
CMD ["bash", "-c", "Xvfb :99 -screen 0 1280x800x24 & python apply.py"]

Build + run:

docker build -t job-form-filler .
docker run --rm --env-file .env \
  -v $(pwd)/jds:/app/jds \
  -v $(pwd)/me.pdf:/app/me.pdf \
  -p 5900:5900 \
  job-form-filler

(Port 5900 is VNC — if you want to watch the agent work live, connect a VNC viewer.)

Part 3 — The JD planner (10 min)

Before the agent touches the browser, it analyzes the JD and decides what "tailored" means for this specific role.

planner.py:

import yaml
from anthropic import Anthropic

client = Anthropic()


def plan(jd_text: str, profile: dict) -> dict:
    """
    Turn the JD + profile into a field-map the agent uses while filling the form.
    Output shape:
      {
        "role_summary": str,
        "why_us": str,         # 80-120 words, ready to paste into "why us?" boxes
        "highlights": [str],   # 3-5 bullet-ready achievements tailored to the JD
        "answers": { standard_question: answer, ... },
      }
    """
    prompt = f"""You are preparing a job application. Read the JD and the candidate's profile,
then produce the JSON blueprint defined below. Do NOT invent employment history —
use ONLY facts from the profile.

JD:
\"\"\"
{jd_text}
\"\"\"

PROFILE:
{yaml.safe_dump(profile, sort_keys=False)}

Return ONLY a JSON object with this shape:
{{
  "role_summary": "one sentence — what does this role actually do",
  "why_us": "80-120 word paragraph, candidate's voice, referencing ONE concrete project from profile + ONE specific thing about this company/role",
  "highlights": ["3-5 bullets tailored to what the JD emphasizes"],
  "answers": {{
    "years_experience": "...",
    "work_authorization": "...",
    "salary_expectations": "...",
    "start_date": "Available with 4 weeks notice"
  }}
}}
"""
    reply = client.messages.create(
        model="claude-sonnet-4-6",
        max_tokens=2048,
        messages=[{"role": "user", "content": prompt}],
    )

    # The model returns a fenced or bare JSON object
    import json, re
    text = reply.content[0].text
    match = re.search(r"\{[\s\S]*\}", text)
    if not match:
        raise ValueError(f"Planner returned no JSON:\n{text[:500]}")
    return json.loads(match.group(0))

Why a separate planner step: the browser agent's context is already crowded with screenshots and tool_use/tool_result cycles. Pre-computing the tailored answers once keeps the browser loop focused on UI mechanics, not re-thinking what to write.

Part 4 — Computer-use tools (10 min)

Standard computer-use tools from Module 3 + Module 4, plus one for file upload.

tools.py:

import subprocess
import base64
import time


def take_screenshot() -> str:
    """Capture the virtual display and return base64 PNG."""
    subprocess.run(["scrot", "/tmp/screen.png"], check=True)
    with open("/tmp/screen.png", "rb") as f:
        return base64.b64encode(f.read()).decode()


def click(x: int, y: int, button: str = "left"):
    btn = {"left": "1", "middle": "2", "right": "3"}[button]
    subprocess.run(["xdotool", "mousemove", str(x), str(y), "click", btn], check=True)
    time.sleep(0.3)


def type_text(text: str):
    # --delay 20 gives the page time to react to each keystroke
    subprocess.run(["xdotool", "type", "--delay", "20", text], check=True)


def key(key_name: str):
    subprocess.run(["xdotool", "key", key_name], check=True)


def scroll(direction: str, amount: int = 3):
    key_name = {"down": "Page_Down", "up": "Page_Up"}[direction]
    for _ in range(amount):
        key(key_name)
    time.sleep(0.3)


def file_upload(file_path: str):
    """Open the OS file dialog that Chrome already triggered, paste the path, submit."""
    # Assumes Chrome's file-picker is open (we triggered it by clicking the upload input)
    subprocess.run(["xdotool", "key", "ctrl+l"], check=True)   # focus path bar on GNOME/KDE dialogs
    time.sleep(0.3)
    type_text(file_path)
    time.sleep(0.2)
    key("Return")
    time.sleep(1.0)

Part 5 — The agent loop (10 min)

apply.py:

import argparse
import yaml
from dotenv import load_dotenv
from anthropic import Anthropic
from planner import plan
from tools import take_screenshot, click, type_text, key, scroll, file_upload

load_dotenv()
client = Anthropic()


TOOL_DEFS = [
    # Anthropic-managed computer tool (screenshot + coord-based click/type)
    {
        "type": "computer_20251124",
        "name": "computer",
        "display_width_px": 1280,
        "display_height_px": 800,
        "display_number": 99,
    },
    # Custom tool for file upload — the OS file dialog is outside the browser
    # viewport so computer_20251124's click can't reliably hit it.
    {
        "name": "file_upload",
        "description": "After you've clicked an upload button and the OS file dialog opened, call this with the absolute file path to submit it.",
        "input_schema": {
            "type": "object",
            "properties": {"file_path": {"type": "string"}},
            "required": ["file_path"],
        },
    },
    # A stop tool — agent calls this instead of clicking Submit
    {
        "name": "stop_for_review",
        "description": "Call this when the form is fully filled and the Submit button is visible but NOT yet pressed. This ends the agent turn so a human can review.",
        "input_schema": {"type": "object", "properties": {"summary": {"type": "string"}}, "required": ["summary"]},
    },
]


SYSTEM_PROMPT = """You are a careful application-filler.

RULES:
1. NEVER press Submit. Always call the `stop_for_review` tool when the form is fully filled.
2. Use ONLY information from the PLAN I give you. Do NOT invent employment history, education, or dates.
3. If a field has no matching plan value, leave it blank or check "prefer not to answer" if that option exists.
4. If you encounter a CAPTCHA or MFA challenge, call `stop_for_review` immediately — humans handle those.
5. Before every action, narrate one line: what you see, which field you're filling, which plan value maps to it.
6. Scroll to find every required field. Forms with * on many fields are common — don't skip any.
"""


def execute_computer_action(action_input: dict):
    """Translate a computer_20251124 action into xdotool calls."""
    action = action_input.get("action")
    if action == "screenshot":
        return {"type": "image", "source": {"type": "base64", "media_type": "image/png", "data": take_screenshot()}}
    if action == "left_click":
        click(action_input["coordinate"][0], action_input["coordinate"][1])
    elif action == "right_click":
        click(action_input["coordinate"][0], action_input["coordinate"][1], button="right")
    elif action == "type":
        type_text(action_input["text"])
    elif action == "key":
        key(action_input["text"])
    elif action in ("scroll_down", "scroll"):
        direction = action_input.get("direction", "down")
        scroll(direction, action_input.get("scroll_amount", 3))
    else:
        return {"type": "text", "text": f"Unsupported action: {action}"}
    return {"type": "image", "source": {"type": "base64", "media_type": "image/png", "data": take_screenshot()}}


def run_agent(jd_text: str, form_url: str, resume_path: str, profile: dict, max_turns: int = 40):
    print("→ Planning...")
    blueprint = plan(jd_text, profile)
    print(f"  role: {blueprint['role_summary']}")
    print(f"  highlights: {len(blueprint['highlights'])} bullets")

    # Open Chrome at the form URL before the agent takes control
    subprocess.Popen(["chromium", "--no-sandbox", "--start-maximized", form_url])
    time.sleep(4)   # let page settle

    messages = [{
        "role": "user",
        "content": [{
            "type": "text",
            "text": f"Form URL: {form_url}\n\nResume path to upload: {resume_path}\n\n"
                    f"PLAN (use this, do not invent):\n{yaml.safe_dump(blueprint)}\n\n"
                    f"PROFILE:\n{yaml.safe_dump(profile)}\n\n"
                    f"Begin by taking a screenshot."
        }]
    }]

    for turn in range(max_turns):
        response = client.messages.create(
            model="claude-sonnet-4-6",
            max_tokens=4096,
            tools=TOOL_DEFS,
            system=SYSTEM_PROMPT,
            betas=["computer-use-2025-11-24"],
            messages=messages,
        )
        messages.append({"role": "assistant", "content": response.content})

        if response.stop_reason != "tool_use":
            print(f"→ Agent stopped without calling tools: {response.stop_reason}")
            break

        tool_results = []
        stop = False
        for block in response.content:
            if block.type != "tool_use":
                continue
            if block.name == "stop_for_review":
                print(f"\n✓ {block.input.get('summary', 'Ready for review')}\n")
                print("  Browser is paused at the review screen. Inspect the form and submit manually.")
                stop = True
                continue
            if block.name == "file_upload":
                file_upload(block.input["file_path"])
                tool_results.append({"type": "tool_result", "tool_use_id": block.id,
                                     "content": f"Uploaded {block.input['file_path']}."})
            elif block.name == "computer":
                result = execute_computer_action(block.input)
                tool_results.append({"type": "tool_result", "tool_use_id": block.id, "content": [result]})

        if stop:
            break

        messages.append({"role": "user", "content": tool_results})

    print("\n→ Form open in Chrome (VNC port 5900). Review, edit if needed, and submit yourself.")
    # Don't close the browser — let the user review.
    input("Press Enter when done to close the browser and exit...")


if __name__ == "__main__":
    import subprocess, time
    parser = argparse.ArgumentParser()
    parser.add_argument("--jd", required=True)
    parser.add_argument("--form", required=True)
    parser.add_argument("--resume", required=True)
    parser.add_argument("--profile", required=True)
    args = parser.parse_args()

    with open(args.jd) as f:
        jd_text = f.read()
    with open(args.profile) as f:
        profile = yaml.safe_load(f)

    run_agent(jd_text, args.form, args.resume, profile)

Part 6 — Safety guardrails (10 min)

Module 5 Lesson 3 had these; here's how they plug into the capstone:

The stop_for_review tool is the primary guardrail — the agent's system prompt forbids pressing Submit. If the agent tries anyway, it fails silently (no matching tool name in our dispatcher) and the loop ends.
No bash tool included — the agent literally cannot execute shell commands. Narrowing tools is the cheapest safety measure.
Profile-only facts — the system prompt forbids invented employment history. With only plan.yaml + profile.yaml as trusted sources, the agent can't confabulate.
Docker isolation — agent runs in a container with only /app mounted, so even if it tried curl to exfiltrate the resume, it has no network tool.
CAPTCHA/MFA escape hatch — rule 4 of the system prompt tells it to stop_for_review on those, avoiding any attempt to defeat them.

Part 7 — Troubleshooting matrix

Symptom	First check	Typical cause
Agent tries to click Submit	inspect message log	System prompt not loaded — ensure `system=SYSTEM_PROMPT` is passed
File upload dialog fills with garbage	tools.py `ctrl+l` behavior	Some Linux file dialogs use `ctrl+location`; switch to `ctrl+L` or use xdotool `type` at root
Form fields filled wrong	inspect `blueprint` in logs	Planner hallucinated — tighten system prompt in planner.py to say "ONLY facts from profile" more aggressively
Browser opens but agent can't see it	DISPLAY env in Dockerfile	Xvfb not started or wrong display number
Agent stops on turn 3 with "task incomplete"	increase `max_turns`	Long forms (25+ fields) need 40-60 turns
MFA prompt appears and agent freezes	check screenshot	Agent correctly called `stop_for_review` — connect VNC and complete MFA yourself, then restart

Build checkpoint — finish this before claiming the certificate

Ship the sandbox. docker build + docker run starts without errors, Xvfb initializes.
Ship the planner. Run planner.plan(jd_text, profile) standalone and verify the JSON output mentions details from both the JD and your profile.
Ship the full loop on ONE form. Pick the simplest form you found earlier. The agent should reach stop_for_review with the form fully filled.
Ship the hard case. Try a form with (a) a file upload, (b) a dropdown select, (c) a long-answer "why us?" box. All three should work.
Screenshot the filled form, the stop_for_review log line, and your final submission confirmation — as your proof of work.

You now have a safe, deployable computer-use agent. Every pattern from the last five modules just landed in a real workflow.

Next (optional): wire this to a job-board scraper (e.g., Greenhouse RSS feeds) so new postings get auto-applications drafted for your review every morning. :::

Job-Form Filler — Agent Pipeline

Quiz