Skip to main content

Watchdog API

The watchdog monitors agent gateway processes for health and automatically recovers from failures. It detects crash loops, performs auto-repair, and sends notifications via Telegram and Discord.

How It Works

Gateway running


Health check every 2 min

      ├── Healthy → continue monitoring

      └── Unhealthy

           ├── First failure → mark degraded
           ├── Retry every 5 sec
           ├── 3 failures → trigger repair


      Auto-repair

           ├── Kill gateway process
           ├── Wait 5 seconds
           ├── Restart gateway
           ├── Verify health

           ├── Success → resume normal monitoring
           └── Failure → retry (max 2 attempts)

                              └── Crash loop detected → notify + stop

Configuration

VariableDefaultDescription
WATCHDOG_CHECK_INTERVAL120Health check interval in seconds
WATCHDOG_DEGRADED_CHECK_INTERVAL5Retry interval when degraded (seconds)
WATCHDOG_MAX_REPAIR_ATTEMPTS2Max auto-repair attempts before crash-loop
WATCHDOG_CRASH_LOOP_WINDOW300Window for crash-loop detection (seconds)
WATCHDOG_CRASH_LOOP_THRESHOLD3Crashes in window to trigger crash-loop
WATCHDOG_AUTO_REPAIRtrueEnable/disable auto-repair
TELEGRAM_BOT_TOKENTelegram bot token for notifications
TELEGRAM_ADMIN_CHAT_IDChat ID for Telegram notifications
DISCORD_WEBHOOK_URLDiscord webhook URL for notifications

Lifecycle States

StateDescription
stoppedGateway is not running
startingGateway process started, waiting for health check
runningGateway is healthy
degradedHealth check failed, retrying
crash_loopToo many crashes, auto-repair exhausted
repairingAuto-repair in progress

Notifications

The watchdog sends notifications for:
  • Crash detected — Gateway process exited unexpectedly
  • Crash loop — 3+ crashes in 5-minute window
  • Auto-repair started — Attempting to restart the gateway
  • Auto-repair succeeded — Gateway is healthy again
  • Auto-repair failed — Could not recover, needs manual intervention

Telegram Format

🐺 Agentbot Watchdog
🔴 Crash loop detected - View logs
Trigger: `crash_loop`
Attempt count: 3

Discord Format

Embedded message with color coding:
  • 🔴 Red: crashes, failures
  • 🟡 Yellow: repairing, degraded
  • 🟢 Green: recovered, healthy

Test Script

Test the watchdog by injecting an invalid config:
#!/bin/bash
CONTAINER_NAME="agentbot-<userId>"
docker exec "$CONTAINER_NAME" node -e "
  const fs = require('fs');
  const cfg = JSON.parse(fs.readFileSync('/data/.openclaw/openclaw.json'));
  cfg.hooks = cfg.hooks || {};
  cfg.hooks.transformDir = '/tmp/does-not-exist';
  fs.writeFileSync('/data/.openclaw/openclaw.json', JSON.stringify(cfg, null, 2));
"
The watchdog should detect the gateway failure and auto-repair within 2 health check cycles.