Lesson 10 of 13

Working with AI Agents

Safety and Oversight

2 min read

AI agents are powerful, but that power comes with responsibility. Understanding safety practices protects your organization and builds trust.

Why Safety Matters

Agents can:

  • Access sensitive data
  • Take actions with real-world consequences
  • Make decisions at scale
  • Operate autonomously

Without proper guardrails, small errors can compound quickly.

Key Safety Principles

1. Principle of Least Privilege

Give agents only the access they need—nothing more.

Example: A document summarization agent doesn't need email sending capability. A meeting scheduler doesn't need database write access.

2. Human-in-the-Loop

Critical decisions should require human approval.

Implementation approaches:

  • Approval workflows for high-stakes actions
  • Review queues for sensitive outputs
  • Escalation paths to human experts
  • Kill switches for emergency stops

3. Audit Trails

Track what agents do, why they did it, and what the outcomes were.

What to log:

  • All tool calls and parameters
  • Decision reasoning
  • User interactions
  • Error states and recoveries

4. Boundary Testing

Regularly test what happens when agents encounter edge cases or adversarial inputs.

Enterprise Security Considerations

According to OWASP's AI Security Initiative (ASI), organizations should implement:

Control Purpose
Zero-trust architecture Verify every request, even from trusted agents
Machine-to-machine OAuth Secure authentication for agent-to-system communication
Rate limiting Prevent runaway automation
Data isolation Separate production from testing environments

Common Risk Scenarios

Data Leakage

Risk: Agent accidentally exposes sensitive information Mitigation: Data classification, output filtering, access controls

Prompt Injection

Risk: Malicious inputs manipulate agent behavior Mitigation: Input validation, output verification, sandboxing

Autonomous Drift

Risk: Agent gradually deviates from intended behavior Mitigation: Regular audits, baseline comparisons, feedback loops

Cascading Failures

Risk: One agent error triggers failures across connected systems Mitigation: Circuit breakers, isolation, graceful degradation

Building a Safety Culture

Safety isn't just technical—it's organizational:

  1. Define acceptable use — What can agents do? What's off-limits?
  2. Establish review processes — Who checks what before deployment?
  3. Create incident response plans — What happens when things go wrong?
  4. Train your team — Everyone should understand agent capabilities and limits

The Balanced Approach

The goal isn't to eliminate all risk—that would eliminate all value. The goal is to:

  • Understand the risks you're taking
  • Implement appropriate controls
  • Monitor for problems
  • Improve continuously

Well-governed agents create more value with less risk than poorly governed ones.

:::

Quiz

Module 3 Quiz: Working with AI Agents

Take Quiz