Skip to content

LoRA: Low-Rank Adaptation

Paper: LoRA: Low-Rank Adaptation of Large Language Models GitHub with examples: LoRA

LoRA_reparametrization

  • Low-Rank Adaptation, or LoRA, which freezes the pretrained model weights and injects trainable rank decomposition matrices into each layer of the Transformer architecture, greatly reducing the number of trainable parameters for downstream tasks.
  • Compared to GPT-3 175B fine-tuned with Adam, LoRA can reduce
  • the number of trainable parameters by 10,000 times and
  • the GPU memory requirement by 3 times.
  • LoRA performs on-par or better than finetuning in model quality on RoBERTa, DeBERTa, GPT-2, and GPT-3, despite having
  • fewer trainable parameters,
  • a higher training throughput, and,
  • unlike adapters,no additional inference latency.