Learning Vector Quantization

13d

Model Showcase: TurboQuant, Gemma, and DeepSeek v4

Google is releasing new Gemma models and a new algorithm, DeepSeek v4 is finally available, and Anthropic is making headlines ...

14d

AI inference just plays by different rules

Users and AI agents feel the outliers. A two-millisecond average latency means nothing if one percent of your queries take ...

GitHub

Python implementation of the TurboQuant and QJL vector quantization algorithms.

turboquant-py implements the TurboQuant and QJL vector quantization algorithms from Google Research (ICLR 2026 / AISTATS 2026). It compresses high-dimensional floating-point vectors to 1-4 bits per ...

1mon

Google's new TurboQuant algorithm speeds up AI memory 8x, cutting costs by 50% or more

Within 24 hours of the release, community members began porting the algorithm to popular local AI libraries like MLX for Apple Silicon and llama.cpp.

Ars Technica

Google’s TurboQuant AI-compression algorithm can reduce LLM memory usage by 6x

Even if you don’t know much about the inner workings of generative AI models, you probably know they need a lot of memory. Hence, it is currently almost impossible to buy a measly stick of RAM without ...

IEEE

MuQ: Self-Supervised Music Representation Learning With Mel Residual Vector Quantization

Abstract: Recent years have witnessed the success of foundation models pre-trained with self-supervised learning (SSL) in various music informatics understanding tasks, including music tagging, ...

TechNode

Huawei Zurich Lab’s New Open-Source Tech Lets LLMs Run on Consumer GPUs

Huawei’s Zurich Computing Systems Laboratory has released SINQ (Sinkhorn Normalization Quantization), an open-source quantization method that reduces the memory requirements of large language models ...

VentureBeat

Huawei's new open source technique shrinks LLMs to make them run on less powerful, less expensive hardware

Huawei’s Computing Systems Lab in Zurich has introduced a new open-source quantization method for large language models (LLMs) aimed at reducing memory demands without sacrificing output quality.

Business Wire

Elastic Announces Faster Filtered Vector Search with ACORN-1 and Default Better Binary Quantization Compression

SAN FRANCISCO--(BUSINESS WIRE)--Elastic (NYSE: ESTC), the Search AI Company, announced new performance and cost-efficiency breakthroughs with two significant enhancements to its vector search. Users ...

World Bank

Foundational Learning

Foundational learning, which includes basic literacy, numeracy, and socio-emotional skills, is the foundation for a life of learning. They also foster social and emotional growth, cognitive ...

Some results have been hidden because they may be inaccessible to you

Show inaccessible results