Cron one-shot 失火时的保底方案：链式任务加 watchdog 与手动恢复路径

问题/场景：社区报告 `schedule.kind=at` 在持续运行后可能卡死在 past due。前置条件：依赖 cron 链式 one-shot 任务。实施步骤：给链路加状态检查与 watchdog → 发现 stale 立刻告警并切换 `sessions_spawn` 恢复 → 清理陈旧任务。关键命令：openclaw gateway status。验证：watchdog 可在窗口内识别卡死并自动恢复。风险：这是临时缓解，根因修复版本仍需跟进。来源：Issue #20586。

GITHUBDiscovered 2026-02-19Author sauravbhattacharya001

Prerequisites

You already run deferred jobs through cron one-shot (`schedule.kind: at`) in production-like loops.
You can inspect cron state and are able to trigger fallback runs when a chain stalls.

Steps

Persist expected next-run timestamp for each chain in a state file for external comparison.
Run a periodic watchdog that flags jobs whose `nextRunAt` is in the past for too long.
When stale is detected, send alert and trigger immediate fallback execution via `sessions_spawn`.
Clean up stale one-shot jobs to avoid duplicated backlog and restart normal scheduling chain.

Commands

openclaw gateway status

Verify

If a one-shot chain stalls, watchdog alerts within one cycle and workload resumes through fallback path.

Caveats

This is a mitigation pattern, not an upstream bug fix; keep tracking official patch progress.
Aggressive watchdog intervals can create duplicate executions if idempotency is not designed（需验证）.

Source attribution

This tip is aggregated from community/public sources and preserved with attribution.

Open original source ↗

Visit original post