← Back to library

Personal Knowledge Base (RAG):把链接和帖子沉淀成可检索知识库

把 URL、tweet、文章等输入后统一检索,适合长期研究和团队知识复用。

GITHUBDiscovered 2026-02-10Author hesamsheikh
Prerequisites
  • A folder-based corpus plan (notes/, links/, summaries/) is prepared.
  • Choose an embedding/retrieval backend and decide refresh cadence.
Steps
  1. Ingest URLs and social posts with metadata: source, author, date, topic.
  2. Chunk documents by semantic section, not fixed length.
  3. Create a retrieval prompt template that asks for citations in every answer.
  4. Run weekly dedup + quality pass to remove stale/low-signal chunks.
Commands
mkdir -p data/kb/{raw,processed,index}
npm run build
Verify

Ask 3 known questions and check if answers include correct source references from your corpus.

Caveats
  • Without metadata, retrieval quality drops sharply when corpus grows.
  • PII handling policy should be documented before importing private notes (needs verification).
Source attribution

This tip is aggregated from community/public sources and preserved with attribution.

Open original source ↗
Visit original post