https://github.com/ggerganov/llama.cpp/pull/4930

  • The importance matrix of the models weights is a value for each parameter that quantifies how large a change in performance is expected from a small change in parameter weight

  • Using a calibration dataset, one can obtain the importance matrix of a model by

    • (well-motivated) computing the gradients, the square gradient can be used as the importance
    • (a bit more heuristic) the square of the activation value
  • One can then use this importance weights in a weighted RMSE minimization when quantizing the tensor.