← Back to library

Heartbeat 的 runOnce 异常隔离与自愈(PR #14901)

解决“某次 heartbeat 任务抛错后调度器整体停摆”的场景:为 runOnce 增加错误隔离,确保后续周期继续运行。

GITHUBDiscovered 2026-02-13Author shtse8
Prerequisites
  • OpenClaw 版本已包含 PR #14901(或同等 heartbeat 调度修复)。
  • 你可以访问 heartbeat 相关日志并手动触发一次任务执行。
Steps
  1. Identify one heartbeat task that occasionally fails and capture its stack trace signature.
  2. Upgrade to PR #14901 build and restart gateway in a low-risk window.
  3. Inject a controlled failure in runOnce path (or replay known bad input) to confirm scheduler survives.
  4. Observe next 3-5 heartbeat cycles and verify subsequent jobs still trigger on schedule.
  5. Keep alerting on repeated runOnce errors, but avoid process-level restarts as first reaction.
Commands
openclaw gateway status
openclaw gateway restart
openclaw status
Verify

After forced runOnce failure, heartbeat scheduler continues running and next intervals are executed.

Caveats
  • Error isolation prevents scheduler death, but does not fix the root cause of the failing task.
  • If failures are side-effectful (e.g., partial writes), add idempotency guards separately(需验证)。
Source attribution

This tip is aggregated from community/public sources and preserved with attribution.

Open original source ↗
Visit original post