Production Deployment & Safety
Capstone: Automated Job-Form Filler
Outcome: By the end of this lesson you will have a working Claude computer-use agent that reads a job description, opens an application form in a real browser (sandboxed), fills every field with content tailored to that JD, attaches your resume PDF — and stops at the submit button for your review. No surprise submissions, no hallucinated employment history, no sketchy auto-applies.
This capstone combines everything from the last five modules: the agent loop (M1), Docker sandbox (M2), desktop automation (M3), browser automation (M4), and safety guardrails (M5).
What you'll ship — this command runs the agent end-to-end:
python apply.py \
--jd ./jds/anthropic-applied-ai.txt \
--form https://boards.greenhouse.io/anthropic/jobs/4087737 \
--resume ./me.pdf \
--profile ./profile.yaml
Output: Chrome opens (inside your Docker container), agent fills the form field by field citing which resume/profile detail it used for each, uploads the PDF, and exits with the browser paused on the review screen. You look at the form, make edits if needed, and hit submit yourself.
The architecture you're building
{
"type": "architecture",
"title": "Job-Form Filler — Agent Pipeline",
"direction": "top-down",
"layers": [
{
"label": "Input",
"color": "blue",
"components": [
{ "label": "Job description (.txt)", "description": "What to apply for" },
{ "label": "Your profile (.yaml)", "description": "Standing answers — the ONLY source of truth for facts" },
{ "label": "Resume (.pdf)", "description": "What to upload" }
]
},
{
"label": "Planner (pre-flight, one LLM call)",
"color": "amber",
"components": [
{ "label": "JD + profile → blueprint", "description": "Tailored answers for why-us / highlights / standard Qs" },
{ "label": "No invented employment history", "description": "System prompt forbids facts not in profile" }
]
},
{
"label": "Agent loop (Dockerized)",
"color": "pink",
"components": [
{ "label": "computer_20250124 tool", "description": "screenshot + click + type + scroll" },
{ "label": "file_upload tool", "description": "Handles the OS file dialog outside the browser viewport" },
{ "label": "stop_for_review tool", "description": "The guardrail — called instead of pressing Submit" }
]
},
{
"label": "Sandbox (Docker + Xvfb)",
"color": "purple",
"components": [
{ "label": "Chromium", "description": "Isolated browser at DISPLAY=:99" },
{ "label": "xdotool", "description": "Executes mouse + keyboard actions" },
{ "label": "VNC (port 5900)", "description": "Optional live preview" }
]
},
{
"label": "Output",
"color": "green",
"components": [
{ "label": "Filled form, paused at Submit", "description": "Human reviews, edits, submits" }
]
}
]
}
Part 1 — Project setup (5 min)
job-form-filler/
├── apply.py # Main agent loop
├── tools.py # Screenshot, click, type, scroll, upload
├── planner.py # JD → field-map planner
├── profile.yaml # Your standing answers (name, email, links)
├── jds/
│ └── anthropic-applied-ai.txt
├── me.pdf
├── requirements.txt
├── Dockerfile
└── .env.example
requirements.txt:
anthropic==0.42.0
pyyaml==6.0.2
python-dotenv==1.0.1
pillow==11.1.0
.env.example:
ANTHROPIC_API_KEY=sk-ant-...
profile.yaml — your standing answers so the agent doesn't hallucinate them:
name: "Your Full Name"
email: "you@example.com"
phone: "+1 555 0100"
linkedin: "https://linkedin.com/in/yourname"
github: "https://github.com/yourname"
website: "https://yourdomain.com"
location: "Cairo, Egypt (Remote)"
work_authorization: "Requires sponsorship in the US; EU citizen"
years_experience: 6
pronouns: "he/him"
salary_expectations: "$160K–$200K base"
# For "why us / why this role" style questions the agent drafts per-JD
writing_voice: "Direct, technical, no buzzwords. Lead with a concrete project."
Part 2 — Docker sandbox (5 min)
Module 2 Lesson 3 built this already. Quick reminder:
Dockerfile:
FROM python:3.12-slim
RUN apt-get update && apt-get install -y \
xvfb x11vnc chromium chromium-driver xdotool scrot \
&& rm -rf /var/lib/apt/lists/*
WORKDIR /app
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
COPY . .
# Xvfb = virtual framebuffer — screenshots work without a real display
ENV DISPLAY=:99
CMD ["bash", "-c", "Xvfb :99 -screen 0 1280x800x24 & python apply.py"]
Build + run:
docker build -t job-form-filler .
docker run --rm --env-file .env \
-v $(pwd)/jds:/app/jds \
-v $(pwd)/me.pdf:/app/me.pdf \
-p 5900:5900 \
job-form-filler
(Port 5900 is VNC — if you want to watch the agent work live, connect a VNC viewer.)
Part 3 — The JD planner (10 min)
Before the agent touches the browser, it analyzes the JD and decides what "tailored" means for this specific role.
planner.py:
import yaml
from anthropic import Anthropic
client = Anthropic()
def plan(jd_text: str, profile: dict) -> dict:
"""
Turn the JD + profile into a field-map the agent uses while filling the form.
Output shape:
{
"role_summary": str,
"why_us": str, # 80-120 words, ready to paste into "why us?" boxes
"highlights": [str], # 3-5 bullet-ready achievements tailored to the JD
"answers": { standard_question: answer, ... },
}
"""
prompt = f"""You are preparing a job application. Read the JD and the candidate's profile,
then produce the JSON blueprint defined below. Do NOT invent employment history —
use ONLY facts from the profile.
JD:
\"\"\"
{jd_text}
\"\"\"
PROFILE:
{yaml.safe_dump(profile, sort_keys=False)}
Return ONLY a JSON object with this shape:
{{
"role_summary": "one sentence — what does this role actually do",
"why_us": "80-120 word paragraph, candidate's voice, referencing ONE concrete project from profile + ONE specific thing about this company/role",
"highlights": ["3-5 bullets tailored to what the JD emphasizes"],
"answers": {{
"years_experience": "...",
"work_authorization": "...",
"salary_expectations": "...",
"start_date": "Available with 4 weeks notice"
}}
}}
"""
reply = client.messages.create(
model="claude-sonnet-4-6",
max_tokens=2048,
messages=[{"role": "user", "content": prompt}],
)
# The model returns a fenced or bare JSON object
import json, re
text = reply.content[0].text
match = re.search(r"\{[\s\S]*\}", text)
if not match:
raise ValueError(f"Planner returned no JSON:\n{text[:500]}")
return json.loads(match.group(0))
Why a separate planner step: the browser agent's context is already crowded with screenshots and tool_use/tool_result cycles. Pre-computing the tailored answers once keeps the browser loop focused on UI mechanics, not re-thinking what to write.
Part 4 — Computer-use tools (10 min)
Standard computer-use tools from Module 3 + Module 4, plus one for file upload.
tools.py:
import subprocess
import base64
import time
def take_screenshot() -> str:
"""Capture the virtual display and return base64 PNG."""
subprocess.run(["scrot", "/tmp/screen.png"], check=True)
with open("/tmp/screen.png", "rb") as f:
return base64.b64encode(f.read()).decode()
def click(x: int, y: int, button: str = "left"):
btn = {"left": "1", "middle": "2", "right": "3"}[button]
subprocess.run(["xdotool", "mousemove", str(x), str(y), "click", btn], check=True)
time.sleep(0.3)
def type_text(text: str):
# --delay 20 gives the page time to react to each keystroke
subprocess.run(["xdotool", "type", "--delay", "20", text], check=True)
def key(key_name: str):
subprocess.run(["xdotool", "key", key_name], check=True)
def scroll(direction: str, amount: int = 3):
key_name = {"down": "Page_Down", "up": "Page_Up"}[direction]
for _ in range(amount):
key(key_name)
time.sleep(0.3)
def file_upload(file_path: str):
"""Open the OS file dialog that Chrome already triggered, paste the path, submit."""
# Assumes Chrome's file-picker is open (we triggered it by clicking the upload input)
subprocess.run(["xdotool", "key", "ctrl+l"], check=True) # focus path bar on GNOME/KDE dialogs
time.sleep(0.3)
type_text(file_path)
time.sleep(0.2)
key("Return")
time.sleep(1.0)
Part 5 — The agent loop (10 min)
apply.py:
import argparse
import yaml
from dotenv import load_dotenv
from anthropic import Anthropic
from planner import plan
from tools import take_screenshot, click, type_text, key, scroll, file_upload
load_dotenv()
client = Anthropic()
TOOL_DEFS = [
# Anthropic-managed computer tool (screenshot + coord-based click/type)
{
"type": "computer_20250124",
"name": "computer",
"display_width_px": 1280,
"display_height_px": 800,
"display_number": 99,
},
# Custom tool for file upload — the OS file dialog is outside the browser
# viewport so computer_20250124's click can't reliably hit it.
{
"name": "file_upload",
"description": "After you've clicked an upload button and the OS file dialog opened, call this with the absolute file path to submit it.",
"input_schema": {
"type": "object",
"properties": {"file_path": {"type": "string"}},
"required": ["file_path"],
},
},
# A stop tool — agent calls this instead of clicking Submit
{
"name": "stop_for_review",
"description": "Call this when the form is fully filled and the Submit button is visible but NOT yet pressed. This ends the agent turn so a human can review.",
"input_schema": {"type": "object", "properties": {"summary": {"type": "string"}}, "required": ["summary"]},
},
]
SYSTEM_PROMPT = """You are a careful application-filler.
RULES:
1. NEVER press Submit. Always call the `stop_for_review` tool when the form is fully filled.
2. Use ONLY information from the PLAN I give you. Do NOT invent employment history, education, or dates.
3. If a field has no matching plan value, leave it blank or check "prefer not to answer" if that option exists.
4. If you encounter a CAPTCHA or MFA challenge, call `stop_for_review` immediately — humans handle those.
5. Before every action, narrate one line: what you see, which field you're filling, which plan value maps to it.
6. Scroll to find every required field. Forms with * on many fields are common — don't skip any.
"""
def execute_computer_action(action_input: dict):
"""Translate a computer_20250124 action into xdotool calls."""
action = action_input.get("action")
if action == "screenshot":
return {"type": "image", "source": {"type": "base64", "media_type": "image/png", "data": take_screenshot()}}
if action == "left_click":
click(action_input["coordinate"][0], action_input["coordinate"][1])
elif action == "right_click":
click(action_input["coordinate"][0], action_input["coordinate"][1], button="right")
elif action == "type":
type_text(action_input["text"])
elif action == "key":
key(action_input["text"])
elif action in ("scroll_down", "scroll"):
direction = action_input.get("direction", "down")
scroll(direction, action_input.get("scroll_amount", 3))
else:
return {"type": "text", "text": f"Unsupported action: {action}"}
return {"type": "image", "source": {"type": "base64", "media_type": "image/png", "data": take_screenshot()}}
def run_agent(jd_text: str, form_url: str, resume_path: str, profile: dict, max_turns: int = 40):
print("→ Planning...")
blueprint = plan(jd_text, profile)
print(f" role: {blueprint['role_summary']}")
print(f" highlights: {len(blueprint['highlights'])} bullets")
# Open Chrome at the form URL before the agent takes control
subprocess.Popen(["chromium", "--no-sandbox", "--start-maximized", form_url])
time.sleep(4) # let page settle
messages = [{
"role": "user",
"content": [{
"type": "text",
"text": f"Form URL: {form_url}\n\nResume path to upload: {resume_path}\n\n"
f"PLAN (use this, do not invent):\n{yaml.safe_dump(blueprint)}\n\n"
f"PROFILE:\n{yaml.safe_dump(profile)}\n\n"
f"Begin by taking a screenshot."
}]
}]
for turn in range(max_turns):
response = client.messages.create(
model="claude-sonnet-4-6",
max_tokens=4096,
tools=TOOL_DEFS,
system=SYSTEM_PROMPT,
betas=["computer-use-2025-01-24"],
messages=messages,
)
messages.append({"role": "assistant", "content": response.content})
if response.stop_reason != "tool_use":
print(f"→ Agent stopped without calling tools: {response.stop_reason}")
break
tool_results = []
stop = False
for block in response.content:
if block.type != "tool_use":
continue
if block.name == "stop_for_review":
print(f"\n✓ {block.input.get('summary', 'Ready for review')}\n")
print(" Browser is paused at the review screen. Inspect the form and submit manually.")
stop = True
continue
if block.name == "file_upload":
file_upload(block.input["file_path"])
tool_results.append({"type": "tool_result", "tool_use_id": block.id,
"content": f"Uploaded {block.input['file_path']}."})
elif block.name == "computer":
result = execute_computer_action(block.input)
tool_results.append({"type": "tool_result", "tool_use_id": block.id, "content": [result]})
if stop:
break
messages.append({"role": "user", "content": tool_results})
print("\n→ Form open in Chrome (VNC port 5900). Review, edit if needed, and submit yourself.")
# Don't close the browser — let the user review.
input("Press Enter when done to close the browser and exit...")
if __name__ == "__main__":
import subprocess, time
parser = argparse.ArgumentParser()
parser.add_argument("--jd", required=True)
parser.add_argument("--form", required=True)
parser.add_argument("--resume", required=True)
parser.add_argument("--profile", required=True)
args = parser.parse_args()
with open(args.jd) as f:
jd_text = f.read()
with open(args.profile) as f:
profile = yaml.safe_load(f)
run_agent(jd_text, args.form, args.resume, profile)
Part 6 — Safety guardrails (10 min)
Module 5 Lesson 3 had these; here's how they plug into the capstone:
- The
stop_for_reviewtool is the primary guardrail — the agent's system prompt forbids pressing Submit. If the agent tries anyway, it fails silently (no matching tool name in our dispatcher) and the loop ends. - No
bashtool included — the agent literally cannot execute shell commands. Narrowing tools is the cheapest safety measure. - Profile-only facts — the system prompt forbids invented employment history. With only
plan.yaml+profile.yamlas trusted sources, the agent can't confabulate. - Docker isolation — agent runs in a container with only
/appmounted, so even if it triedcurlto exfiltrate the resume, it has no network tool. - CAPTCHA/MFA escape hatch — rule 4 of the system prompt tells it to stop_for_review on those, avoiding any attempt to defeat them.
Part 7 — Troubleshooting matrix
| Symptom | First check | Typical cause |
|---|---|---|
| Agent tries to click Submit | inspect message log | System prompt not loaded — ensure system=SYSTEM_PROMPT is passed |
| File upload dialog fills with garbage | tools.py ctrl+l behavior | Some Linux file dialogs use ctrl+location; switch to ctrl+L or use xdotool type at root |
| Form fields filled wrong | inspect blueprint in logs | Planner hallucinated — tighten system prompt in planner.py to say "ONLY facts from profile" more aggressively |
| Browser opens but agent can't see it | DISPLAY env in Dockerfile | Xvfb not started or wrong display number |
| Agent stops on turn 3 with "task incomplete" | increase max_turns | Long forms (25+ fields) need 40-60 turns |
| MFA prompt appears and agent freezes | check screenshot | Agent correctly called stop_for_review — connect VNC and complete MFA yourself, then restart |
Build checkpoint — finish this before claiming the certificate
- Ship the sandbox.
docker build + docker runstarts without errors, Xvfb initializes. - Ship the planner. Run
planner.plan(jd_text, profile)standalone and verify the JSON output mentions details from both the JD and your profile. - Ship the full loop on ONE form. Pick the simplest form you found earlier. The agent should reach
stop_for_reviewwith the form fully filled. - Ship the hard case. Try a form with (a) a file upload, (b) a dropdown select, (c) a long-answer "why us?" box. All three should work.
- Screenshot the filled form, the
stop_for_reviewlog line, and your final submission confirmation — as your proof of work.
You now have a safe, deployable computer-use agent. Every pattern from the last five modules just landed in a real workflow.
Next (optional): wire this to a job-board scraper (e.g., Greenhouse RSS feeds) so new postings get auto-applications drafted for your review every morning. :::