IndexCache Delivers 1.82× Speedup for Long-Context LLM Inference — and It’s Already Open Source
Tsinghua University and Z.ai researchers release IndexCache, a sparse attention optimizer that cuts 75% of redundant computation in DeepSeek-based LLMs, delivering 1.82× faster inference at 200K token context lengths.