forked from pytorch/ao
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Introducing 1-bit quantization for Llama in torchchat (pytorch#911)
Summary: Pull Request resolved: pytorch#911 Pull Request resolved: pytorch#910 ### THIS DIFF I introduce the ability to use 1-bit quantization in torchao (i.e. pack and unpack bytes than consist of 1 bit of information). For example, given 8 bytes of 1-bit quantized values: `00000001` `00000001` `00000000` `00000001` `00000000` `00000000` `00000000` `00000001` We can pack them into 1 byte: `11010001` and vice-versa. Main changes: - added `uint1.h` that contains the internal helper functions to pack-unpack 8 bytes, 64 and 128 bytes of `uint1`s. - modified `bitpack.h` to add case statements for 1-bit quantization in the general functions that perform vectorized packing/unpacking on ARM neon vectors. (32, 64, 128 values) ### CONTEXT Refer to previous diffs introducing 2-5 bit quantization. 2-bit: D62133659 ### Optional: I noticed that the individual tests in `test_bitpacking.cpp` for 1, 3, and 5 bits were identical and could potentially be factored out into a group. Maybe for a future diff? Reviewed By: metascroy Differential Revision: D63052325
- Loading branch information
1 parent
23321fb
commit 275541d
Showing
7 changed files
with
505 additions
and
22 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.