https://github.com/ggerganov/llama.cpp/pull/4930
-
The importance matrix of the models weights is a value for each parameter that quantifies how large a change in performance is expected from a small change in parameter weight
-
Using a calibration dataset, one can obtain the importance matrix of a model by
- (well-motivated) computing the gradients, the square gradient can be used as the importance
- (a bit more heuristic) the square of the activation value
-
One can then use this importance weights in a weighted RMSE minimization when quantizing the tensor.