Heartbeat 的 runOnce 异常隔离与自愈(PR #14901)
解决“某次 heartbeat 任务抛错后调度器整体停摆”的场景:为 runOnce 增加错误隔离,确保后续周期继续运行。
GITHUBDiscovered 2026-02-13Author shtse8
Prerequisites
- OpenClaw 版本已包含 PR #14901(或同等 heartbeat 调度修复)。
- 你可以访问 heartbeat 相关日志并手动触发一次任务执行。
Steps
- Identify one heartbeat task that occasionally fails and capture its stack trace signature.
- Upgrade to PR #14901 build and restart gateway in a low-risk window.
- Inject a controlled failure in runOnce path (or replay known bad input) to confirm scheduler survives.
- Observe next 3-5 heartbeat cycles and verify subsequent jobs still trigger on schedule.
- Keep alerting on repeated runOnce errors, but avoid process-level restarts as first reaction.
Commands
openclaw gateway statusopenclaw gateway restartopenclaw statusVerify
After forced runOnce failure, heartbeat scheduler continues running and next intervals are executed.
Caveats
- Error isolation prevents scheduler death, but does not fix the root cause of the failing task.
- If failures are side-effectful (e.g., partial writes), add idempotency guards separately(需验证)。
Source attribution
This tip is aggregated from community/public sources and preserved with attribution.
Open original source ↗