Cron 调度错误隔离,避免单个坏任务拖垮全局
面向‘一个错误 cron 表达式导致全部定时任务停摆’场景,社区 PR 给出可执行治理方案:按任务隔离异常、累计错误并自动停用高风险任务。
GITHUBDiscovered 2026-02-12Author MarvinDontPanic
Prerequisites
- You run multiple cron jobs and at least one can be edited by humans/automation.
- Gateway logs are retained so consecutive scheduler failures can be observed.
Steps
- Audit current jobs and identify malformed schedule risk (invalid cron expr, wrong timezone, null everyMs).
- Upgrade to a build containing PR #14385, then restart gateway.
- Inject one intentionally malformed schedule in staging to verify per-job try/catch isolation.
- Confirm error counter increments and auto-disable triggers after repeated failures.
- Fix the schedule and verify recovery resets error count while other jobs stay healthy.
Commands
openclaw gateway statusopenclaw gateway restartopenclaw helpVerify
A broken cron job no longer blocks healthy jobs; scheduler loop keeps running and only faulty job gets disabled.
Caveats
- PR is open at discovery time; behavior and field names may change before merge(需验证).
- Auto-disable threshold may need policy tuning for bursty jobs.
Source attribution
This tip is aggregated from community/public sources and preserved with attribution.
Open original source ↗