• full-parameter finetuning changes every parameter
  • LoRa is a generalization of finetuning by introducing two knobs
    • how many parameters (specifically matrices) we update
    • how much we update the matrices (i.e. by varying the rank)
  • So, during the finetuning, for the matrices we want to update, we just introduce another residual path with a rank matrix (can be written as where is dxr and B rxd), which we optimize by gradient
  • At inference time, we can just add to the original matrix (so same inference time), and we can switch between different LoRAs, depending on the task, just by substracting out
  • One can stack multiple LoRAs together, and it tends to work