← Back to library

Cron one-shot 失火时的保底方案:链式任务加 watchdog 与手动恢复路径

问题/场景:社区报告 `schedule.kind=at` 在持续运行后可能卡死在 past due。前置条件:依赖 cron 链式 one-shot 任务。实施步骤:给链路加状态检查与 watchdog → 发现 stale 立刻告警并切换 `sessions_spawn` 恢复 → 清理陈旧任务。关键命令:openclaw gateway status。验证:watchdog 可在窗口内识别卡死并自动恢复。风险:这是临时缓解,根因修复版本仍需跟进。来源:Issue #20586。

GITHUBDiscovered 2026-02-19Author sauravbhattacharya001
Prerequisites
  • You already run deferred jobs through cron one-shot (`schedule.kind: at`) in production-like loops.
  • You can inspect cron state and are able to trigger fallback runs when a chain stalls.
Steps
  1. Persist expected next-run timestamp for each chain in a state file for external comparison.
  2. Run a periodic watchdog that flags jobs whose `nextRunAt` is in the past for too long.
  3. When stale is detected, send alert and trigger immediate fallback execution via `sessions_spawn`.
  4. Clean up stale one-shot jobs to avoid duplicated backlog and restart normal scheduling chain.
Commands
openclaw gateway status
Verify

If a one-shot chain stalls, watchdog alerts within one cycle and workload resumes through fallback path.

Caveats
  • This is a mitigation pattern, not an upstream bug fix; keep tracking official patch progress.
  • Aggressive watchdog intervals can create duplicate executions if idempotency is not designed(需验证).
Source attribution

This tip is aggregated from community/public sources and preserved with attribution.

Open original source ↗
Visit original post