Heartbeat 的 runOnce 异常隔离与自愈（PR #14901）

解决“某次 heartbeat 任务抛错后调度器整体停摆”的场景：为 runOnce 增加错误隔离，确保后续周期继续运行。

GITHUBDiscovered 2026-02-13Author shtse8

Prerequisites

Steps

Identify one heartbeat task that occasionally fails and capture its stack trace signature.
Upgrade to PR #14901 build and restart gateway in a low-risk window.
Inject a controlled failure in runOnce path (or replay known bad input) to confirm scheduler survives.
Observe next 3-5 heartbeat cycles and verify subsequent jobs still trigger on schedule.
Keep alerting on repeated runOnce errors, but avoid process-level restarts as first reaction.

Commands

openclaw gateway status

openclaw gateway restart

openclaw status

Verify

After forced runOnce failure, heartbeat scheduler continues running and next intervals are executed.

Caveats

Error isolation prevents scheduler death, but does not fix the root cause of the failing task.
If failures are side-effectful (e.g., partial writes), add idempotency guards separately（需验证）。

Source attribution

This tip is aggregated from community/public sources and preserved with attribution.