KV Cache Implementation

Verkor Launches Industry's First TurboQuant LLM Inference Accelerator Silicon IP

VerTQ is an accelerator chip that implements Google's TurboQuant algorithm which reduces KV cache memory usage of Large ...

Elastx Expands Cloud Infrastructure with Lightbits Software-Defined Storage

Lightbits Labs (Lightbits®), inventor of the NVMe® over TCP storage protocol and the first KV cache engine optimized for AI, today announced that its long-standing customer, Elastx, a leading European ...

15d

Lightbits Labs: Elastx Expands Cloud Infrastructure with Lightbits Software-Defined Storage

Eliminating Need for Proprietary Hardware Lightbits Labs (Lightbits), inventor of the NVMe over TCP storage protocol and the first KV cache engine optimized for AI, today announced that its long-stand ...

12h

Micron Technology, Inc. (MU) Presents at J.P. Morgan 54th Annual Global Technology, Media and Communications Conference Transcript

Micron Technology, Inc. ( MU) J.P. Morgan 54th Annual Global Technology, Media and Communications Conference May 20, 2026 8:40 AM EDT ...

Morning Overview on MSN

Google’s TurboQuant algorithm slashes the memory bottleneck that limits how many AI models can run at once

Running a large language model is expensive, and a surprising amount of that cost comes down to memory, not computation.

Morning Overview on MSN

Google’s new speed trick makes its open AI models run 3x faster without losing a single point of accuracy

A team of Google researchers has published a technique that could let developers squeeze roughly three times more throughput ...

13d

Open-source AI developer Moonshot AI raises $2B at $20B+ valuation

Chinese artificial intelligence lab Moonshot AI has raised $2 billion in funding at a valuation exceeding $20 billion.

AI’s Memory Crisis Is Here: Don’t Hoard, Optimize

The GPUs powering today's models carry limited high-bandwidth memory (HBM) before external memory is required—that's the ...

Geeky Gadgets

Meet oMLX : Apple Silicon’s Fastest Local AI Model Runner

OMLX is a specialized inference engine designed to harness the full capabilities of Apple Silicon for running local AI models. By using Apple’s MLX framework and advanced memory management techniques, ...

Silicon Motion Technology Corporation (SIMO) Presents at J.P. Morgan 54th Annual Global Technology, Media and Communications Conference Transcript

So good morning, everyone, and thank you for joining JPMorgan's 54th Annual Technology, Media and Communications Conference. My name is Mayur Ramdhani, SMID-cap analyst at JPMorgan covering U.S. semis ...

Some results have been hidden because they may be inaccessible to you

Show inaccessible results