PTQ
- Accurate post training quantization with small calibration sets
- Optimal Brain Compression: A framework for accurate post-training quantization and pruning
- GPTQ: Accurate Post-Training Quantization for Generative Pre-trained Transformers
- AWQ: Activation-Aware Weight Quantization for LLM Compression and Acceleration
- SmoothQuant: Accurate and Efficient Post-Training Quantization for Large Language Models
Rotations
- QuIP#: Even Better LLM Quantization with Hadamard Incoherence and Lattice Codebooks
- QuIP: 2-Bit Quantization of Large Language Models with Guarantees
- QuaRot: Outlier-Free 4-Bit Inference in Rotated LLMs
- SpinQuant: LLM Quantization with Learned Rotations