← ClaudeAtlas

lmcache-mplisted

LMCache multiprocess (MP) mode — standalone LMCache server in its own pod/process that vLLM connects to over ZMQ. Gives process isolation, no GIL contention on the inference path, one cache shared by multiple vLLM pods per node, and CPU-memory scaling independent of GPU memory. Covers the `LMCacheMPConnector` path (vs the in-process `LMCacheConnectorV1`), the DaemonSet+Deployment K8s pattern and LMCache Operator, the L1 (CPU DRAM) + L2 (NIXL, fs, mooncake_store, s3, Redis) cascade, the `lmcache/standalone` + `lmcache/vllm-openai` image pair, and the production gotchas (`--no-enable-prefix-caching`, `--disable-hybrid-kv-cache-manager`, vLLM/lmcache version pins, hybrid models unsupported, cache_salt fallback bug).
air-gapped/skills · ★ 3 · AI & Automation · score 79
Install: claude install-skill air-gapped/skills
# LMCache multiprocess (MP) mode Target audience: operators running vLLM on H100/H200/B200-class GPUs in production who need KV-cache extension beyond HBM and have outgrown the in-process LMCache path. Assumes Kubernetes or bare container deployment. ## Why this exists separately from `vllm-caching` `vllm-caching` covers vLLM's **native** CPU-offload (`--kv-offloading-size`, `OffloadingConnector`) and the **in-process** `LMCacheConnectorV1` (LMCache linked into the vLLM worker). MP mode is structurally different: - LMCache runs in its **own process / container / pod** with its own CPU and memory budget. - vLLM talks to it over **ZMQ** (DEALER/ROUTER pattern, default port 5555). - One LMCache server can serve **multiple vLLM pods on the same node** — they share the L1 cache. - L2 cascade (NVMe, S3, Mooncake, HF3FS) is configured on the LMCache side, not vLLM side. Different image pair, different deployment shape, different troubleshooting surface. Hence its own skill. ## Decision tree — pick a path Ask in order: 1. **Single vLLM pod, only need CPU DRAM tier, no node-shared cache?** → Native offload (`--kv-offloading-size N --kv-offloading-backend native --disable-hybrid-kv-cache-manager`). Zero extra pods. Use the `vllm-caching` skill, not this one. 2. **Single vLLM pod, need NVMe as a third tier, but no other pod will share the cache?** → In-process `LMCacheConnectorV1` (`--kv-transfer-config '{"kv_connector":"LMCacheConnectorV1","kv_role":"kv_both"}'` + `LMCAC