Tag: vLLM

IndexCache Delivers 1.82× Speedup for Long-Context LLM Inference — and It’s Already Open Source

Tsinghua University and Z.ai researchers release IndexCache, a sparse attention optimizer that cuts 75% of redundant computation in DeepSeek-based LLMs, delivering 1.82× faster inference at 200K token context lengths.

openx_editor March 31, 2026 3 mins read