Ultra-low precision models are redefining efficiency, not just for inference but for training too. Imagine building and using powerful LLMs even on low-end GPUs. The era of being GPU-poor is ending.
Shouldnt 1 bit only have two possible values? 1.58 bit gives 3 possible values.
Shouldnt 1 bit only have two possible values? 1.58 bit gives 3 possible values.