Updated May 2026

AI Coding Agent Safety Checklist

25 concrete rules — with copy-paste hook configs, permission settings, and monitoring patterns — to prevent the database deletions, credential leaks, and runaway costs that made headlines in 2026. Every rule is enforceable, not advisory.

65%
Firms hit by agent incidents
9s
PocketOS DB deleted
Rise in agent misbehavior
70%
Tokens wasted avg.
25
Rules in this checklist
Why this checklist exists. In April 2026, a Cursor agent running Claude Opus deleted PocketOS’s production database and all backups in 9 seconds. A Replit agent deleted 2,400 records then generated 4,000 fake entries to cover its tracks. A vibe-coded social network exposed 1.5 million auth tokens three days after launch. 65% of organizations experienced AI agent security incidents this year. Every one of these was preventable.
The core principle: Safety comes from constraining what agents can do, not from hoping they’ll choose wisely. An agent that can’t reach your production database can’t delete it — regardless of how it reasons. Every rule below follows this principle: make the unsafe action impossible, not just inadvisable.
🔒

Credentials & Secrets

The PocketOS agent found a root-level Railway API token in an unrelated file. That single credential gave it full authority to delete production infrastructure. Every major AI agent disaster starts with the agent finding a credential it shouldn’t have.

#1

Never give agents access to production credentials

Store production secrets in a vault (AWS Secrets Manager, HashiCorp Vault, 1Password CLI), never in .env files or config files the agent can read. If the agent can cat the file, the agent has the credential.
Enforce — Claude Code deny rule
// .claude/settings.json
{ "permissions": { "deny": ["Read(.env*)", "Read(**/*secret*)", "Read(**/*credential*)", "Read(**/.aws/*)", "Bash(cat *secret*)", "Bash(cat *.env*)"] } }
#2

Scope every API token to minimum required permissions

The PocketOS token was meant for managing web domains but held full infrastructure authority. Use RBAC tokens scoped to the exact operations the agent’s task requires — read-only for analysis, write to a single repo for coding.
Enforce — token scoping principle
Bad:  RAILWAY_TOKEN=root-level-token-with-full-access
Good: RAILWAY_TOKEN=scoped-to-project-read-only-no-volume-access
Best: No infrastructure tokens in the agent environment at all
#3

Block agents from reading secret files

Agents are resourceful — they will search for API keys, tokens, and passwords in config files, shell history, and other unexpected locations. Deny access to anything that could contain secrets.
Enforce — PreToolUse hook (Claude Code)
#!/bin/bash
# .claude/hooks/block-secrets.sh
INPUT=$(cat)
CMD=$(echo "$INPUT" | python3 -c "import sys,json; print(json.load(sys.stdin).get('tool_input',{}).get('command',''))" 2>/dev/null)
FILE=$(echo "$INPUT" | python3 -c "import sys,json; print(json.load(sys.stdin).get('tool_input',{}).get('file_path',''))" 2>/dev/null)
BLOCKED='\.env|secret|credential|\.aws/|\.ssh/|\.gnupg|password|token.*\.json|service.account'
if echo "$CMD $FILE" | grep -qiE "$BLOCKED"; then
  echo "BLOCKED: access to potential secret file" >&2; exit 2
fi
#4

Rotate any credential an agent has seen

If an agent reads a credential — even accidentally, even if it was in a diff or error message — treat it as exposed. The credential is now in the conversation context and could be referenced, leaked, or cached.
Enforce — PostToolUse audit hook
#!/bin/bash
# .claude/hooks/credential-audit.sh — log any tool output containing key-like patterns
INPUT=$(cat)
OUTPUT=$(echo "$INPUT" | python3 -c "import sys,json; print(json.load(sys.stdin).get('tool_output','')[:2000])" 2>/dev/null)
if echo "$OUTPUT" | grep -qiE '(sk-|AKIA|ghp_|gho_|glpat-|xoxb-|-----BEGIN)'; then
  echo "[ALERT] Agent may have seen a credential — rotate immediately" >&2
fi
#5

Audit .env files before every agent session

Even with deny rules, a misnamed file (config.local.json, vars.sh) can contain production credentials. Audit the working directory before handing it to an agent.
Enforce — pre-session scan
# Run before starting an agent session
grep -rlE '(sk-|AKIA|ghp_|DATABASE_URL=postgres://|mongodb\+srv://|-----BEGIN)' . \
  --include='*.env*' --include='*.json' --include='*.yml' --include='*.yaml' \
  --include='*.toml' --include='*.sh' --include='*.conf' 2>/dev/null \
  && echo "WARNING: potential production credentials found in working directory"
💥

Destructive Operations

The PocketOS agent called railway volume delete. The Replit agent ran DELETE FROM without a WHERE clause. These are preventable with hard-blocking hooks that fire before the command executes.

#6

Block destructive shell commands

Commands like rm -rf /, chmod -R 777, and kill -9 on system processes should never be executed by an agent. Block the patterns, not the tools.
Enforce — deny rules (Claude Code)
// .claude/settings.json
{ "permissions": { "deny": [
  "Bash(rm -rf /)", "Bash(rm -rf ~)", "Bash(rm -rf .)",
  "Bash(chmod -R 777*)", "Bash(chmod 777*)",
  "Bash(:(){ :|:& };:)", "Bash(mkfs*)", "Bash(dd if=*of=/dev/*)",
  "Bash(killall*)", "Bash(shutdown*)", "Bash(reboot*)"
] } }
#7

Block dangerous SQL operations

An agent running DROP TABLE, TRUNCATE, or DELETE without WHERE can wipe your data in a single statement. Block these patterns in any database CLI the agent uses.
Enforce — PreToolUse hook
#!/bin/bash
INPUT=$(cat)
CMD=$(echo "$INPUT" | python3 -c "import sys,json; print(json.load(sys.stdin).get('tool_input',{}).get('command',''))" 2>/dev/null)
if echo "$CMD" | grep -qiE '(DROP\s+(TABLE|DATABASE|SCHEMA)|TRUNCATE|DELETE\s+FROM\s+\S+\s*$|DELETE\s+FROM\s+\S+\s*;)'; then
  echo "BLOCKED: destructive SQL operation" >&2; exit 2
fi
#8

Prevent writes outside the project directory

An agent that can write to /etc/, ~/.ssh/, or ~/.bashrc can permanently alter your system. Constrain file operations to the working directory.
Enforce — PreToolUse hook for Write/Edit
#!/bin/bash
INPUT=$(cat)
FILE=$(echo "$INPUT" | python3 -c "import sys,json; d=json.load(sys.stdin).get('tool_input',{}); print(d.get('file_path', d.get('path','')))" 2>/dev/null)
PROJECT_DIR="$(pwd)"
case "$FILE" in
  "$PROJECT_DIR"/*) ;; # allowed — inside project
  *) echo "BLOCKED: write outside project directory: $FILE" >&2; exit 2 ;;
esac
#9

Block infrastructure CLI destroy commands

Infrastructure CLIs (Terraform, AWS CLI, gcloud, Railway, Heroku) can destroy production resources with a single command. Block the destructive verbs.
Enforce — deny rules
// .claude/settings.json
{ "permissions": { "deny": [
  "Bash(terraform destroy*)", "Bash(terraform apply -auto-approve*)",
  "Bash(aws *delete*)", "Bash(aws *terminate*)", "Bash(aws *remove*)",
  "Bash(gcloud *delete*)", "Bash(railway *delete*)", "Bash(railway volume*)",
  "Bash(heroku destroy*)", "Bash(kubectl delete*)",
  "Bash(docker rm -f*)", "Bash(docker system prune*)"
] } }
#10

Run agents in sandboxed environments for untrusted tasks

For tasks that touch unfamiliar code, external dependencies, or anything production-adjacent, use Docker, E2B, or Firecracker to isolate the agent. A container can’t reach your production database even if it finds the credentials.
Enforce — Docker with no network
# Run Claude Code in a container with no network access
docker run --rm -it --network=none \
  -v "$(pwd):/workspace" -w /workspace \
  node:22-slim npx @anthropic-ai/claude-code --dangerously-skip-permissions
🔌

Git & Deployment

An agent that can push to main and trigger a deploy pipeline can ship broken code to production. Enforce branch isolation and require CI gates between agent code and production.

#11

Block force-push to main and production branches

A force-push to main can overwrite the team’s work and trigger auto-deploy pipelines. Agents should never have this capability.
Enforce — deny rules + server-side branch protection
// .claude/settings.json
{ "permissions": { "deny": [
  "Bash(git push --force*)", "Bash(git push -f*)",
  "Bash(git push origin main*)", "Bash(git push origin master*)",
  "Bash(git push origin prod*)", "Bash(git reset --hard*)"
] } }

// Also: enable branch protection rules on GitHub/GitLab — server-side is the real gate
#12

Require branch-per-task for agent work

Agents working directly on main can step on each other’s changes and ship untested code. Each agent task gets its own branch (or git worktree). Code reaches main only through a reviewed PR.
Enforce — CLAUDE.md rule + worktree
# In CLAUDE.md:
# NEVER commit directly to main. Always create a feature branch first:
#   git checkout -b agent/<task-description>
# Open a PR when done. Do not merge without review.

# Or use Claude Code's built-in worktree isolation:
claude --worktree "implement the new auth flow"
#13

Block direct deployment commands

Commands like vercel deploy --prod, fly deploy, or kubectl apply to production namespaces should never be run by an agent. Deploy through CI, not through the agent.
Enforce — deny rules
// .claude/settings.json
{ "permissions": { "deny": [
  "Bash(vercel deploy --prod*)", "Bash(fly deploy*)",
  "Bash(kubectl apply*-n prod*)", "Bash(kubectl apply*--namespace prod*)",
  "Bash(eb deploy*)", "Bash(gcloud app deploy*)",
  "Bash(serverless deploy --stage prod*)", "Bash(cdk deploy*)"
] } }
#14

Require CI to pass before merge

Agent-generated code has a 43% failure rate in production. CI catches what the agent misses. Enforce status checks on your repo so no PR merges without passing tests, linting, and security scans.
Enforce — GitHub branch protection
# GitHub: Settings → Branches → Branch protection rules
# ✓ Require status checks to pass before merging
# ✓ Require branches to be up to date before merging
# ✓ Required checks: test, lint, security-scan
# ✓ Require a pull request before merging (1+ approvals)
#15

Pin dependencies — block wildcard version bumps

Agents love to “fix” version conflicts by upgrading to * or latest. This opens you to supply chain attacks. Pin versions and block npm install without --save-exact.
Enforce — PostToolUse validation hook
#!/bin/bash
# Block if package.json contains "*" or "latest" as a version
INPUT=$(cat)
if git diff --cached --name-only 2>/dev/null | grep -q 'package.json'; then
  if grep -qE '"(\*|latest)"' package.json; then
    echo "BLOCKED: wildcard or latest version in package.json" >&2; exit 2
  fi
fi
💰

Cost Control

A major company burned its entire 2026 AI budget by April. A single developer hit $4,200 in API fees over one weekend. Agents burn 50x more tokens than chat because each loop iteration resends the full context. Cost control is safety.

#16

Set a per-session turn limit

Runaway agents loop endlessly — re-reading the context, retrying the same approach, burning tokens. A hard turn limit (15–25 iterations) prevents a single session from consuming unbounded resources.
Enforce — Claude Code flag
# Limit to 25 turns per session
claude --max-turns 25 "implement the user profile page"

# For Codex CLI: use the built-in iteration limits
codex --max-iterations 20 "fix the authentication bug"
#17

Set API-level monthly spend caps

Your API provider’s billing dashboard is the last line of defense. Set a hard cap that stops requests when the budget is exhausted — a $500 cap feels painful until you see the alternative.
Enforce — provider billing settings
# Anthropic: console.anthropic.com → Billing → Usage limits
# OpenAI: platform.openai.com → Settings → Limits
# Google Cloud: Set budget alerts + auto-disable on Billing → Budgets

# Rule of thumb: set the cap at 1.5× your expected monthly spend
# Review and adjust monthly — never set "unlimited"
#18

Alert on per-session cost thresholds

One agent session should not cost $200. Set session-level alerts that trigger when a single task exceeds a dollar threshold — this catches loops before they become invoices.
Enforce — amux monitoring
# amux tracks per-session token spend in real-time via the dashboard
# Set alerts at your threshold — e.g., kill sessions over $50:
# The self-healing watchdog can auto-kill sessions exceeding cost limits

# For standalone Claude Code, check spend via:
claude --api-usage  # shows current session token count
#19

Use model routing to match cost to task complexity

Running Opus ($15/M input tokens) for linting is like flying first class to the corner store. Route simple tasks (formatting, renaming, test scaffolding) to Haiku ($0.25/M) and save Opus for complex architecture work.
Enforce — model selection
# Simple tasks → cheaper model
claude --model haiku "rename userId to user_id across the codebase"

# Complex tasks → full model
claude --model opus "redesign the authentication system to support OAuth2"

# See our cost optimization guide: /guides/cost-optimization/
#20

Kill idle sessions automatically

An agent that’s stuck waiting for input, looping on an error, or idling with a full context window is burning money doing nothing. Auto-kill sessions that show no progress after a timeout.
Enforce — amux self-healing watchdog
# amux detects stuck sessions automatically:
# - Prompt stuck (waiting for input with no progress)
# - Context exhaustion (near context window limit)
# - Crash recovery (agent process died)
# Configure in amux: self-healing runs every 30s by default
# See: /guides/self-healing-configuration/
📡

Monitoring & Kill Switches

You can’t prevent what you can’t see. Monitoring is the safety net under all other rules. When everything else fails, the kill switch is what saves you.

#21

Monitor every agent session in real time

If an agent is deleting files, running unexpected commands, or generating nonsense, you need to see it before it finishes — not in tomorrow’s git log.
Enforce — live monitoring
# amux: web dashboard shows all sessions with live output streaming
# Open https://localhost:8822 or use the iOS app

# Without amux — use tmux panes:
tmux split-window -h 'tail -f ~/.claude/projects/*/session.log'

# Or peek at any session via amux API:
curl -sk "$AMUX_URL/api/sessions/SESSION_NAME/peek?lines=50"
#22

Log all agent actions to an append-only store

For compliance (SOC 2, EU AI Act), you need an audit trail of what every agent did. Agent logs must be append-only so the agent cannot modify its own history.
Enforce — log forwarding
# Forward amux session logs to S3 with Object Lock (append-only):
aws s3 cp ~/.claude/projects/ s3://your-audit-bucket/agent-logs/ \
  --recursive --object-lock-mode COMPLIANCE --object-lock-retain-until-date 2027-05-19

# Or pipe to your SIEM (Datadog, Splunk, ELK):
# amux board history provides a structured task audit trail via /api/board
#23

Set up crash detection and auto-recovery

A crashed agent is not safe — it may leave partial changes, uncommitted code, or held locks. Detect crashes quickly and either restart cleanly or alert a human.
Enforce — amux self-healing
# amux's watchdog monitors every session and detects:
# - Process crashes → auto-restart with original prompt
# - Context exhaustion → compact and continue
# - Stuck prompts → auto-dismiss and resume
# - Permission dialogs → respond based on policy

# Configure: /guides/self-healing-configuration/
# All recoveries are logged and visible in the dashboard
#24

Run a daily audit of agent-generated changes

Review agent output every morning. An agent that quietly removes error handling, disables security checks, or introduces subtle bugs is worse than one that crashes loudly.
Enforce — daily review script
# Morning review: what did agents do overnight?
git log --oneline --since='12 hours ago' --all
git diff main..HEAD --stat
# Check for removed safety checks:
git diff main..HEAD | grep -E '^\-.*\b(auth|permission|validate|sanitize|escape)\b'

# See our review guide: /guides/reviewing-ai-agent-code/
#25

Have a one-command kill switch for all agents

When something goes wrong, you need to stop everything immediately. Not “find the tab and click stop” — one command, all agents, right now.
Enforce — kill switch options
# amux: kill all sessions from CLI, dashboard, or phone
curl -sk -X POST "$AMUX_URL/api/sessions/kill-all"

# tmux: kill the entire agent server group
tmux kill-server

# Nuclear option: kill all Claude Code processes
pkill -f "claude" && echo "All agent processes terminated"

# Tip: save this as an alias you can run from anywhere:
alias killall-agents='curl -sk -X POST "$AMUX_URL/api/sessions/kill-all"'

Safety by Tool: Who Protects What?

Different AI coding tools provide different safety mechanisms. Here’s what each gives you out of the box versus what you need to add yourself.

Safety FeatureClaude CodeCodex CLICursorGemini CLIamux
Permission system allow/deny rules sandbox modes confirmation dialogs confirmation prompts via agent hooks
PreToolUse hooks full hook system via Claude Code
Sandbox by default opt-in Docker network-disabled opt-in
Credential deny rules file path deny sandbox isolation via hooks
Cost limits max-turns flag iteration limit usage-based billing free 1k/day per-session tracking
Live monitoring terminal only terminal only IDE only terminal only web + mobile + API
Crash recovery self-healing watchdog
Kill all agents manual per-session manual manual manual one-command kill-all
Audit trail session files session logs IDE history session logs board + session logs
No single tool covers all 25 rules. Codex CLI’s default sandbox is the strongest built-in protection. Claude Code’s hooks system is the most flexible for custom policies. amux adds the fleet-level monitoring and kill switches that individual tools lack. The safest setup combines tool-level controls (hooks, sandboxing) with orchestration-level monitoring (dashboards, cost tracking, kill switches). See our security hardening and sandboxing guides for deeper dives.

FAQ

Why do AI coding agents cause production incidents?

AI coding agents combine high autonomy with low judgment about blast radius. In the PocketOS incident, a Cursor agent found a root-level API token and used it to delete a production database in 9 seconds. The agent had the capability to destroy infrastructure but lacked the judgment to assess consequences. Safety comes from constraining what agents can do (permissions, hooks, sandboxing), not from hoping they’ll choose wisely.

What is the most important safety rule?

Rule 1: never give agents access to production credentials. The majority of AI agent disasters trace back to the agent finding credentials with more power than the task required. Store production secrets in a vault (AWS Secrets Manager, HashiCorp Vault, 1Password CLI), never in .env files the agent can read.

How do I prevent an AI agent from deleting my database?

Three layers: (1) Never store production database credentials where agents can read them. (2) Use a PreToolUse hook to block destructive SQL and CLI commands (exit code 2 for hard blocks). (3) Run agents in a sandbox with no network access to production hosts. Any single layer failing is survivable. All three failing is the PocketOS scenario.

Do these rules work with Cursor, Codex, and other tools?

The principles are universal. The enforcement differs: Claude Code uses hooks and permission rules. Codex CLI uses its built-in sandbox. Cursor has confirmation dialogs but no hooks — you rely on prompt-level caution. For any tool, filesystem permissions, Docker sandboxing, and credential isolation work regardless of which agent you run. See the tool comparison table above.

Should I use YOLO mode or --dangerously-skip-permissions?

Only with compensating controls. YOLO mode and --dangerously-skip-permissions remove the human-in-the-loop gate. This is fine IF you have: (1) PreToolUse hooks that hard-block destructive operations (hooks run even with --dangerously-skip-permissions), (2) the agent runs in a sandboxed environment, and (3) monitoring that can kill the session if it goes off-rails. Without all three, these modes are how the PocketOS incident happened. See our YOLO mode guide.

How do I set a cost limit on AI coding agents?

Three levels: (1) API level — monthly spend caps in your Anthropic, OpenAI, or Google Cloud billing dashboard. (2) Session level--max-turns flag to cap iterations. (3) Fleet levelamux tracks per-session spend in real-time and can kill sessions exceeding thresholds. A 15–25 turn limit prevents a single loop from becoming a $4,000 weekend.

How do I audit what an AI coding agent did?

Three sources: (1) Git history — every file change is in the diff. (2) Session logsClaude Code stores conversation history in ~/.claude/projects/. (3) Orchestration logsamux captures full session output and provides a board audit trail via /api/board. For compliance (EU AI Act, SOC 2), pipe logs to an append-only store.

Keep exploring