LoRA

full-parameter finetuning changes every parameter
LoRa is a generalization of finetuning by introducing two knobs
- how many parameters (specifically matrices) we update
- how much we update the matrices (i.e. by varying the rank)
So, during the finetuning, for the matrices we want to update, we just introduce another residual path with a rank $r$ matrix (can be written as $W^{*} = A B$ where $A$ is dxr and B rxd), which we optimize by gradient
At inference time, we can just add $W^{*}$ to the original matrix (so same inference time), and we can switch between different LoRAs, depending on the task, just by substracting out $W^{*}$
One can stack multiple LoRAs together, and it tends to work

🤖 Harold's Notes