AWQ search for accurate quantization. Pre-computed AWQ model zoo for LLMs (LLaMA, Llama2, OPT, CodeLlama, StarCoder, Vicuna, LLaVA; load to generate quantized weights). Memory-efficient 4-bit Linear ...
Sensitivity analysis (and partial quantization) example is also provided. The figure below shows per-layer sensitivity analysis result of efficientnet_lite0 model. Only the static post-training ...
Abstract: Single image super-resolution (SISR) aims to reconstruct a high-resolution image from its low-resolution observation. Recent deep learning-based SISR models show high performance at the ...
Abstract: The huge memory and computing costs of deep neural networks (DNNs) greatly hinder their deployment on resource-constrained devices with high efficiency. Quantization has emerged as an ...