Memory (QMD Hybrid Search)
KubeClaw ships QMD as the default memory backend. QMD is a local-first hybrid search sidecar that combines three retrieval strategies into a single ranked result set:
- BM25 full-text search for precise keyword matching
- Vector similarity search for semantic recall via embeddings
- MMR (Maximal Marginal Relevance) reranking to balance relevance with diversity
QMD is deployed automatically as an init container and two CronJobs. No external services or databases are required.
How it works
At pod startup, the qmd-init initContainer (using the oven/bun image) installs the QMD CLI into a shared emptyDir volume at /qmd-bin. The Gateway's PATH is updated to include this directory so agents can invoke QMD directly.
When an agent queries memory, QMD runs a three-stage pipeline:
- BM25 lexical pass scores documents by term frequency and inverse document frequency
- Vector pass computes cosine similarity between the query embedding and pre-built document embeddings
- MMR reranking merges both result sets, promoting documents that are both relevant and non-redundant
The default hybrid weights are 70% vector, 30% text. These weights are configured in the gateway config under the hybrid search section.
CronJobs
Two CronJobs keep the search index current. Both iterate through all agent directories, processing each agent that has an active state directory.
Update job
Re-indexes markdown content into the BM25 search index.
| Key | Default | Description |
|---|---|---|
qmd.update.enabled | true | Enable the update CronJob |
qmd.update.schedule | */5 * * * * | Cron schedule (default: every 5 minutes) |
Embed job
Generates and refreshes vector embeddings using node-llama-cpp with GGUF models (~0.6 GB).
| Key | Default | Description |
|---|---|---|
qmd.embed.enabled | true | Enable the embed CronJob |
qmd.embed.schedule | */15 * * * * | Cron schedule (default: every 15 minutes) |
Embed resources
The embedding job is the most resource-intensive QMD component. Default limits:
qmd:
embed:
resources:
requests:
cpu: 500m
memory: 512Mi
limits:
cpu: "2"
memory: 2Gi
If your cluster is resource-constrained, reduce the limits and extend the embed schedule. The update job is lightweight and rarely needs tuning.
Disabling QMD
To disable QMD entirely, set both CronJobs to disabled:
qmd:
update:
enabled: false
embed:
enabled: false
The init container is gated on memory being enabled in the chart. Disabling both jobs also skips the init container, so no QMD components are deployed.