学习系统防‘工具饿死’:给关键工具配置 seed priors
场景:Thompson Sampling 在连续拒绝后可能把 web/exec 等关键工具后验压到近零。可通过 seed arms + min pulls 保留探索能力。
GITHUBDiscovered 2026-02-14Author Peleke
Prerequisites
- Learning/bandit mechanism is enabled and tool arm statistics are inspectable.
- You can edit learning config and reset learning DB in maintenance windows.
Steps
- Audit current arm posteriors and identify critical tools with near-zero inclusion probability.
- Define `seedArms` patterns for critical categories (web, exec, fs read/write) with conservative boosts.
- Set a non-zero `min_pulls` guard so low-sample arms remain eligible during recovery phase.
- Re-run representative tasks and compare tool selection diversity/accuracy before and after seeding.
Commands
openclaw gateway statusopenclaw logs --local-timels ~/.qortex/learning/openclaw.dbVerify
Critical tools reappear in candidate/executed sets and task success rate recovers without broad over-triggering.
Caveats
- Over-boosted priors can bias selection and reduce adaptability for new tasks.
- Issue references ongoing implementation work; final config keys may differ (needs verification).
Source attribution
This tip is aggregated from community/public sources and preserved with attribution.
Open original source ↗