AI Coding Agent Safety Checklist
25 concrete rules — with copy-paste hook configs, permission settings, and monitoring patterns — to prevent the database deletions, credential leaks, and runaway costs that made headlines in 2026. Every rule is enforceable, not advisory.
Credentials & Secrets
The PocketOS agent found a root-level Railway API token in an unrelated file. That single credential gave it full authority to delete production infrastructure. Every major AI agent disaster starts with the agent finding a credential it shouldn’t have.
Never give agents access to production credentials
.env files or config files the agent can read. If the agent can cat the file, the agent has the credential.// .claude/settings.json
{ "permissions": { "deny": ["Read(.env*)", "Read(**/*secret*)", "Read(**/*credential*)", "Read(**/.aws/*)", "Bash(cat *secret*)", "Bash(cat *.env*)"] } }
Scope every API token to minimum required permissions
Bad: RAILWAY_TOKEN=root-level-token-with-full-access Good: RAILWAY_TOKEN=scoped-to-project-read-only-no-volume-access Best: No infrastructure tokens in the agent environment at all
Block agents from reading secret files
#!/bin/bash
# .claude/hooks/block-secrets.sh
INPUT=$(cat)
CMD=$(echo "$INPUT" | python3 -c "import sys,json; print(json.load(sys.stdin).get('tool_input',{}).get('command',''))" 2>/dev/null)
FILE=$(echo "$INPUT" | python3 -c "import sys,json; print(json.load(sys.stdin).get('tool_input',{}).get('file_path',''))" 2>/dev/null)
BLOCKED='\.env|secret|credential|\.aws/|\.ssh/|\.gnupg|password|token.*\.json|service.account'
if echo "$CMD $FILE" | grep -qiE "$BLOCKED"; then
echo "BLOCKED: access to potential secret file" >&2; exit 2
fi
Rotate any credential an agent has seen
#!/bin/bash
# .claude/hooks/credential-audit.sh — log any tool output containing key-like patterns
INPUT=$(cat)
OUTPUT=$(echo "$INPUT" | python3 -c "import sys,json; print(json.load(sys.stdin).get('tool_output','')[:2000])" 2>/dev/null)
if echo "$OUTPUT" | grep -qiE '(sk-|AKIA|ghp_|gho_|glpat-|xoxb-|-----BEGIN)'; then
echo "[ALERT] Agent may have seen a credential — rotate immediately" >&2
fi
Audit .env files before every agent session
config.local.json, vars.sh) can contain production credentials. Audit the working directory before handing it to an agent.# Run before starting an agent session grep -rlE '(sk-|AKIA|ghp_|DATABASE_URL=postgres://|mongodb\+srv://|-----BEGIN)' . \ --include='*.env*' --include='*.json' --include='*.yml' --include='*.yaml' \ --include='*.toml' --include='*.sh' --include='*.conf' 2>/dev/null \ && echo "WARNING: potential production credentials found in working directory"
Destructive Operations
The PocketOS agent called railway volume delete. The Replit agent ran DELETE FROM without a WHERE clause. These are preventable with hard-blocking hooks that fire before the command executes.
Block destructive shell commands
rm -rf /, chmod -R 777, and kill -9 on system processes should never be executed by an agent. Block the patterns, not the tools.// .claude/settings.json
{ "permissions": { "deny": [
"Bash(rm -rf /)", "Bash(rm -rf ~)", "Bash(rm -rf .)",
"Bash(chmod -R 777*)", "Bash(chmod 777*)",
"Bash(:(){ :|:& };:)", "Bash(mkfs*)", "Bash(dd if=*of=/dev/*)",
"Bash(killall*)", "Bash(shutdown*)", "Bash(reboot*)"
] } }
Block dangerous SQL operations
DROP TABLE, TRUNCATE, or DELETE without WHERE can wipe your data in a single statement. Block these patterns in any database CLI the agent uses.#!/bin/bash
INPUT=$(cat)
CMD=$(echo "$INPUT" | python3 -c "import sys,json; print(json.load(sys.stdin).get('tool_input',{}).get('command',''))" 2>/dev/null)
if echo "$CMD" | grep -qiE '(DROP\s+(TABLE|DATABASE|SCHEMA)|TRUNCATE|DELETE\s+FROM\s+\S+\s*$|DELETE\s+FROM\s+\S+\s*;)'; then
echo "BLOCKED: destructive SQL operation" >&2; exit 2
fi
Prevent writes outside the project directory
/etc/, ~/.ssh/, or ~/.bashrc can permanently alter your system. Constrain file operations to the working directory.#!/bin/bash
INPUT=$(cat)
FILE=$(echo "$INPUT" | python3 -c "import sys,json; d=json.load(sys.stdin).get('tool_input',{}); print(d.get('file_path', d.get('path','')))" 2>/dev/null)
PROJECT_DIR="$(pwd)"
case "$FILE" in
"$PROJECT_DIR"/*) ;; # allowed — inside project
*) echo "BLOCKED: write outside project directory: $FILE" >&2; exit 2 ;;
esac
Block infrastructure CLI destroy commands
// .claude/settings.json
{ "permissions": { "deny": [
"Bash(terraform destroy*)", "Bash(terraform apply -auto-approve*)",
"Bash(aws *delete*)", "Bash(aws *terminate*)", "Bash(aws *remove*)",
"Bash(gcloud *delete*)", "Bash(railway *delete*)", "Bash(railway volume*)",
"Bash(heroku destroy*)", "Bash(kubectl delete*)",
"Bash(docker rm -f*)", "Bash(docker system prune*)"
] } }
Run agents in sandboxed environments for untrusted tasks
# Run Claude Code in a container with no network access docker run --rm -it --network=none \ -v "$(pwd):/workspace" -w /workspace \ node:22-slim npx @anthropic-ai/claude-code --dangerously-skip-permissions
Git & Deployment
An agent that can push to main and trigger a deploy pipeline can ship broken code to production. Enforce branch isolation and require CI gates between agent code and production.
Block force-push to main and production branches
main can overwrite the team’s work and trigger auto-deploy pipelines. Agents should never have this capability.// .claude/settings.json
{ "permissions": { "deny": [
"Bash(git push --force*)", "Bash(git push -f*)",
"Bash(git push origin main*)", "Bash(git push origin master*)",
"Bash(git push origin prod*)", "Bash(git reset --hard*)"
] } }
// Also: enable branch protection rules on GitHub/GitLab — server-side is the real gate
Require branch-per-task for agent work
main can step on each other’s changes and ship untested code. Each agent task gets its own branch (or git worktree). Code reaches main only through a reviewed PR.# In CLAUDE.md: # NEVER commit directly to main. Always create a feature branch first: # git checkout -b agent/<task-description> # Open a PR when done. Do not merge without review. # Or use Claude Code's built-in worktree isolation: claude --worktree "implement the new auth flow"
Block direct deployment commands
vercel deploy --prod, fly deploy, or kubectl apply to production namespaces should never be run by an agent. Deploy through CI, not through the agent.// .claude/settings.json
{ "permissions": { "deny": [
"Bash(vercel deploy --prod*)", "Bash(fly deploy*)",
"Bash(kubectl apply*-n prod*)", "Bash(kubectl apply*--namespace prod*)",
"Bash(eb deploy*)", "Bash(gcloud app deploy*)",
"Bash(serverless deploy --stage prod*)", "Bash(cdk deploy*)"
] } }
Require CI to pass before merge
# GitHub: Settings → Branches → Branch protection rules # ✓ Require status checks to pass before merging # ✓ Require branches to be up to date before merging # ✓ Required checks: test, lint, security-scan # ✓ Require a pull request before merging (1+ approvals)
Pin dependencies — block wildcard version bumps
* or latest. This opens you to supply chain attacks. Pin versions and block npm install without --save-exact.#!/bin/bash
# Block if package.json contains "*" or "latest" as a version
INPUT=$(cat)
if git diff --cached --name-only 2>/dev/null | grep -q 'package.json'; then
if grep -qE '"(\*|latest)"' package.json; then
echo "BLOCKED: wildcard or latest version in package.json" >&2; exit 2
fi
fi
Cost Control
A major company burned its entire 2026 AI budget by April. A single developer hit $4,200 in API fees over one weekend. Agents burn 50x more tokens than chat because each loop iteration resends the full context. Cost control is safety.
Set a per-session turn limit
# Limit to 25 turns per session claude --max-turns 25 "implement the user profile page" # For Codex CLI: use the built-in iteration limits codex --max-iterations 20 "fix the authentication bug"
Set API-level monthly spend caps
# Anthropic: console.anthropic.com → Billing → Usage limits # OpenAI: platform.openai.com → Settings → Limits # Google Cloud: Set budget alerts + auto-disable on Billing → Budgets # Rule of thumb: set the cap at 1.5× your expected monthly spend # Review and adjust monthly — never set "unlimited"
Alert on per-session cost thresholds
# amux tracks per-session token spend in real-time via the dashboard # Set alerts at your threshold — e.g., kill sessions over $50: # The self-healing watchdog can auto-kill sessions exceeding cost limits # For standalone Claude Code, check spend via: claude --api-usage # shows current session token count
Use model routing to match cost to task complexity
# Simple tasks → cheaper model claude --model haiku "rename userId to user_id across the codebase" # Complex tasks → full model claude --model opus "redesign the authentication system to support OAuth2" # See our cost optimization guide: /guides/cost-optimization/
Kill idle sessions automatically
# amux detects stuck sessions automatically: # - Prompt stuck (waiting for input with no progress) # - Context exhaustion (near context window limit) # - Crash recovery (agent process died) # Configure in amux: self-healing runs every 30s by default # See: /guides/self-healing-configuration/
Monitoring & Kill Switches
You can’t prevent what you can’t see. Monitoring is the safety net under all other rules. When everything else fails, the kill switch is what saves you.
Monitor every agent session in real time
# amux: web dashboard shows all sessions with live output streaming # Open https://localhost:8822 or use the iOS app # Without amux — use tmux panes: tmux split-window -h 'tail -f ~/.claude/projects/*/session.log' # Or peek at any session via amux API: curl -sk "$AMUX_URL/api/sessions/SESSION_NAME/peek?lines=50"
Log all agent actions to an append-only store
# Forward amux session logs to S3 with Object Lock (append-only): aws s3 cp ~/.claude/projects/ s3://your-audit-bucket/agent-logs/ \ --recursive --object-lock-mode COMPLIANCE --object-lock-retain-until-date 2027-05-19 # Or pipe to your SIEM (Datadog, Splunk, ELK): # amux board history provides a structured task audit trail via /api/board
Set up crash detection and auto-recovery
# amux's watchdog monitors every session and detects: # - Process crashes → auto-restart with original prompt # - Context exhaustion → compact and continue # - Stuck prompts → auto-dismiss and resume # - Permission dialogs → respond based on policy # Configure: /guides/self-healing-configuration/ # All recoveries are logged and visible in the dashboard
Run a daily audit of agent-generated changes
# Morning review: what did agents do overnight? git log --oneline --since='12 hours ago' --all git diff main..HEAD --stat # Check for removed safety checks: git diff main..HEAD | grep -E '^\-.*\b(auth|permission|validate|sanitize|escape)\b' # See our review guide: /guides/reviewing-ai-agent-code/
Have a one-command kill switch for all agents
# amux: kill all sessions from CLI, dashboard, or phone curl -sk -X POST "$AMUX_URL/api/sessions/kill-all" # tmux: kill the entire agent server group tmux kill-server # Nuclear option: kill all Claude Code processes pkill -f "claude" && echo "All agent processes terminated" # Tip: save this as an alias you can run from anywhere: alias killall-agents='curl -sk -X POST "$AMUX_URL/api/sessions/kill-all"'
Safety by Tool: Who Protects What?
Different AI coding tools provide different safety mechanisms. Here’s what each gives you out of the box versus what you need to add yourself.
| Safety Feature | Claude Code | Codex CLI | Cursor | Gemini CLI | amux |
|---|---|---|---|---|---|
| Permission system | ✓ allow/deny rules | ✓ sandbox modes | ○ confirmation dialogs | ○ confirmation prompts | ✓ via agent hooks |
| PreToolUse hooks | ✓ full hook system | ✗ | ✗ | ✗ | ✓ via Claude Code |
| Sandbox by default | ✗ opt-in Docker | ✓ network-disabled | ✗ | ✗ | ✗ opt-in |
| Credential deny rules | ✓ file path deny | ✓ sandbox isolation | ✗ | ✗ | ✓ via hooks |
| Cost limits | ○ max-turns flag | ○ iteration limit | ○ usage-based billing | ✓ free 1k/day | ✓ per-session tracking |
| Live monitoring | ✗ terminal only | ✗ terminal only | ○ IDE only | ✗ terminal only | ✓ web + mobile + API |
| Crash recovery | ✗ | ✗ | ✗ | ✗ | ✓ self-healing watchdog |
| Kill all agents | ✗ manual per-session | ✗ manual | ✗ manual | ✗ manual | ✓ one-command kill-all |
| Audit trail | ○ session files | ○ session logs | ○ IDE history | ○ session logs | ✓ board + session logs |
FAQ
Why do AI coding agents cause production incidents?
AI coding agents combine high autonomy with low judgment about blast radius. In the PocketOS incident, a Cursor agent found a root-level API token and used it to delete a production database in 9 seconds. The agent had the capability to destroy infrastructure but lacked the judgment to assess consequences. Safety comes from constraining what agents can do (permissions, hooks, sandboxing), not from hoping they’ll choose wisely.
What is the most important safety rule?
Rule 1: never give agents access to production credentials. The majority of AI agent disasters trace back to the agent finding credentials with more power than the task required. Store production secrets in a vault (AWS Secrets Manager, HashiCorp Vault, 1Password CLI), never in .env files the agent can read.
How do I prevent an AI agent from deleting my database?
Three layers: (1) Never store production database credentials where agents can read them. (2) Use a PreToolUse hook to block destructive SQL and CLI commands (exit code 2 for hard blocks). (3) Run agents in a sandbox with no network access to production hosts. Any single layer failing is survivable. All three failing is the PocketOS scenario.
Do these rules work with Cursor, Codex, and other tools?
The principles are universal. The enforcement differs: Claude Code uses hooks and permission rules. Codex CLI uses its built-in sandbox. Cursor has confirmation dialogs but no hooks — you rely on prompt-level caution. For any tool, filesystem permissions, Docker sandboxing, and credential isolation work regardless of which agent you run. See the tool comparison table above.
Should I use YOLO mode or --dangerously-skip-permissions?
Only with compensating controls. YOLO mode and --dangerously-skip-permissions remove the human-in-the-loop gate. This is fine IF you have: (1) PreToolUse hooks that hard-block destructive operations (hooks run even with --dangerously-skip-permissions), (2) the agent runs in a sandboxed environment, and (3) monitoring that can kill the session if it goes off-rails. Without all three, these modes are how the PocketOS incident happened. See our YOLO mode guide.
How do I set a cost limit on AI coding agents?
Three levels: (1) API level — monthly spend caps in your Anthropic, OpenAI, or Google Cloud billing dashboard. (2) Session level — --max-turns flag to cap iterations. (3) Fleet level — amux tracks per-session spend in real-time and can kill sessions exceeding thresholds. A 15–25 turn limit prevents a single loop from becoming a $4,000 weekend.
How do I audit what an AI coding agent did?
Three sources: (1) Git history — every file change is in the diff. (2) Session logs — Claude Code stores conversation history in ~/.claude/projects/. (3) Orchestration logs — amux captures full session output and provides a board audit trail via /api/board. For compliance (EU AI Act, SOC 2), pipe logs to an append-only store.
Keep exploring
- AI Agent Security Hardening Guide — the 6-layer defense model and 15-point hardening checklist
- AI Agent Sandboxing — Docker, E2B, Firecracker, gVisor, Modal & Daytona compared
- Claude Code Hooks Cookbook — 20 production-ready hook recipes
- Setting Up YOLO Mode Safely — how to skip permissions without skipping safety
- Reviewing AI-Generated Code — the morning review playbook for agent PRs
- Cost Optimization Guide — model routing, context discipline, and spend tracking
- Self-Healing Configuration — auto-compaction, restart, and prompt recovery
- AI Coding Tools Landscape 2026 — every agent, IDE, and platform compared