IndexCache: Tsinghua Researchers Achieve 1.82x Speed Boost for Long-Context AI Inference
Researchers at Tsinghua University have developed IndexCache, a new technique that achieves 1.82x faster inference for long-context AI models by eliminating redundant sparse attention computations.