Tag: LLM optimization

Google TurboQuant: The Algorithm That Cuts AI Memory Costs by 50% or More

Google's new TurboQuant algorithm achieves 8x memory reduction in AI models with zero accuracy loss, potentially transforming AI deployment economics.

openx_editor March 31, 2026 3 mins read

AI Models

IndexCache: Tsinghua Researchers Achieve 1.82x Speed Boost for Long-Context AI Inference

Researchers at Tsinghua University have developed IndexCache, a new technique that achieves 1.82x faster inference for long-context AI models by eliminating redundant sparse attention computations.

openx_editor March 31, 2026 5 mins read