语音输入可观测性实战：启用 `echoTranscript` 先回显再执行

问题/场景：语音转写误差会直接影响后续自动化执行，用户难以及时发现。前置条件：已启用音频转写 provider；可修改 `tools.media.audio`。实施步骤：1) 开启 `echoTranscript`；2) 配置 `echoFormat`；3) 在高风险流程中增加人工确认；4) 记录误识别样本迭代提示词。关键配置：`tools.media.audio.echoTranscript`、`echoFormat`。验证方法：每条语音在代理执行前都能收到可读回显。风险与边界：回显可能暴露敏感口述内容，群聊场景需谨慎。来源归因：PR #32150。

GITHUBDiscovered 2026-03-07Author AytuncYildizli

Prerequisites

Audio transcription provider is configured and functioning.
You can edit media-understanding settings and redeploy gateway.

Steps

Set `tools.media.audio.echoTranscript: true` in config for target environments.
Customize `echoFormat` to include a clear prefix (for example `🎙️ Heard: {transcript}`).
For risky actions, require user confirmation after transcript echo and before tool execution.
Collect false-transcription samples and refine prompts/language hints weekly.

Commands

openclaw gateway status

openclaw gateway restart

openclaw help

Verify

Voice messages consistently produce transcript echo before any downstream agent action.

Caveats

Avoid enabling transcript echo in sensitive group chats without consent controls.
Accent/noise robustness varies by provider and language pack（需验证）.

Source attribution

This tip is aggregated from community/public sources and preserved with attribution.

Open original source ↗

Visit original post