Understanding Cache Compression

Performance Engineering in Distributed Systems: Lessons That Compound Over Time

Years of working with large-scale distributed systems have reinforced a lesson that only becomes clearer with time: ...

RLWRLD releases RLDX-1, a dexterity-first foundation model for robot hands

RLWRLD said with RLDX-1, it aimed to include things like context memorization or force sensing, which existing models often ...

11d

5% GPU utilization: The $401 billion AI infrastructure problem enterprises can't keep ignoring

Enterprises locked in GPU capacity during the AI scramble. Now utilization sits at 5% and the bill is due. Here's what the ...

IEEE

ShrinKV: Key-Value Cache Compression with Progressive Hidden States Shrinking to Mitigate Prefilling Latency

Abstract: The autoregressive attention mechanism in large language models (LLMs) enables the avoidance of redundant computations by storing Key-Value (KV) caches. Existing KV cache compression methods ...

InfoQ

Google’s TurboQuant Compression May Support Faster Inference, Same Accuracy on Less Capable Hardware

Unlock the full InfoQ experience by logging in! Stay updated with your favorite authors and topics, engage with content, and download exclusive resources. Dany Lepage discusses the architectural ...

Wall Street Journal

The 2,000-Year-Old Cement Battery That Could Reduce Our Reliance on Fossil Fuel

Adding water to Cache Energy’s cement pellets causes a chemical reaction that releases heat. The reaction is reversible, allowing the system to store heat as well. CACHE ENERGY More than two millennia ...

Forbes

Google’s TurboQuant Compression Could Increase Demand For AI Memory

This voice experience is generated by AI. Learn more. This voice experience is generated by AI. Learn more. On March 24, 2026 Amir Zandieh and Vahab Mirrokni from Google Research published an article ...

Ars Technica

Google’s TurboQuant AI-compression algorithm can reduce LLM memory usage by 6x

Even if you don’t know much about the inner workings of generative AI models, you probably know they need a lot of memory. Hence, it is currently almost impossible to buy a measly stick of RAM without ...

SDxCentral

TurboQuant: Did Google just drop a compression algorithm capable of stemming RAMageddon?

AI has a growing memory problem. Google thinks it's found the answer, and it doesn't require more or better hardware. Originally detailed in an April 2025 paper, TurboQuant is an advanced compression ...

Some results have been hidden because they may be inaccessible to you

Show inaccessible results