← Back to library

为 Cron 失败写入 lastErrorReason,支持原因级运维

适用于‘任务失败但定位慢’场景:新增失败原因分类(billing/rate_limit/auth/timeout/format),可直接驱动告警分级与处置分流。

GITHUBDiscovered 2026-02-12Author futuremind2026
Prerequisites
  • Cron jobs already emit error logs and you can view job state fields.
  • Team has a basic incident taxonomy (auth, timeout, quota, etc.).
Steps
  1. Upgrade to a build containing PR #14382 and restart gateway.
  2. Create controlled failure cases (expired key, forced timeout, invalid format) in staging jobs.
  3. Inspect job state and verify lastErrorReason is populated with expected class.
  4. Map each reason to runbook actions: retry/backoff/escalate/manual fix.
  5. Add dashboard/alert routing by reason to reduce mean-time-to-diagnosis.
Commands
openclaw gateway status
openclaw gateway restart
openclaw help
Verify

Failed jobs carry reason tags that match injected failures, and responders can triage without reading full stack traces.

Caveats
  • This line is observability-first; scheduling policy changes are separate follow-ups(需验证).
  • Reason parsing may classify unknown provider errors as generic buckets.
Source attribution

This tip is aggregated from community/public sources and preserved with attribution.

Open original source ↗
Visit original post