Slim-Llama reduces power needs using binary/ternary quantization Achieves 4.59x efficiency boost, consuming 4.69–82.07mW at scale Supports 3B-parameter models with 489ms latency, enabling efficiency ...
Results that may be inaccessible to you are currently showing.
Hide inaccessible results