Lesson 10 of 24

LoRA & QLoRA in Practice

LoRA Configuration

3 min read

Understanding LoRA parameters is crucial for successful fine-tuning. Let's explore each parameter and its impact.

Core LoRA Parameters

The LoraConfig Object

from peft import LoraConfig, TaskType

config = LoraConfig(
    r=16,                           # Rank
    lora_alpha=32,                  # Alpha scaling
    target_modules="all-linear",    # Which layers
    lora_dropout=0.05,              # Dropout
    bias="none",                    # Bias training
    task_type=TaskType.CAUSAL_LM    # Task type
)

Rank (r)

The rank determines the size of the LoRA matrices and directly affects capacity.

Original weight: W ∈ R^(d×k)
LoRA adds: A ∈ R^(d×r), B ∈ R^(r×k)
Output: W' = W + (A × B)

Trainable params = 2 × r × (d + k)

Rank Selection Guide

Rank Parameters Memory Use Case
4 Very few Minimal Simple style changes
8 Few Low Basic tasks
16 Moderate Medium Default choice
32 Many Higher Complex tasks
64+ Very many High Maximum capacity
# Low rank for simple tasks
simple_config = LoraConfig(r=8, ...)

# Higher rank for complex domain knowledge
complex_config = LoraConfig(r=32, ...)

Alpha (lora_alpha)

Alpha is a scaling factor that controls how much the LoRA update affects the output.

Scaling = alpha / r
Output = W + (alpha/r) × (A × B)

Common Patterns

# Pattern 1: Alpha = Rank (scaling = 1)
config = LoraConfig(r=16, lora_alpha=16)

# Pattern 2: Alpha = 2×Rank (scaling = 2)
config = LoraConfig(r=16, lora_alpha=32)

# Pattern 3: Fixed alpha (adjust with rank)
config = LoraConfig(r=32, lora_alpha=16)  # scaling = 0.5

Rule of thumb: Start with alpha = rank (scaling = 1), then adjust based on training stability.

Target Modules

Which layers to add LoRA adapters to. This significantly affects both capacity and memory.

Common Options

# All linear layers (recommended for 2025)
config = LoraConfig(target_modules="all-linear")

# Attention only (traditional approach)
config = LoraConfig(target_modules=["q_proj", "k_proj", "v_proj", "o_proj"])

# Attention + MLP
config = LoraConfig(target_modules=[
    "q_proj", "k_proj", "v_proj", "o_proj",
    "gate_proj", "up_proj", "down_proj"
])

Finding Target Modules

from transformers import AutoModelForCausalLM

model = AutoModelForCausalLM.from_pretrained("meta-llama/Llama-3.2-3B-Instruct")

# Print all linear layer names
for name, module in model.named_modules():
    if isinstance(module, torch.nn.Linear):
        print(name)

Output for Llama:

model.embed_tokens
model.layers.0.self_attn.q_proj
model.layers.0.self_attn.k_proj
model.layers.0.self_attn.v_proj
model.layers.0.self_attn.o_proj
model.layers.0.mlp.gate_proj
model.layers.0.mlp.up_proj
model.layers.0.mlp.down_proj
...

Dropout

Regularization to prevent overfitting.

# No dropout (for larger datasets)
config = LoraConfig(lora_dropout=0.0)

# Light dropout (for smaller datasets)
config = LoraConfig(lora_dropout=0.05)

# Higher dropout (for very small datasets or overfitting)
config = LoraConfig(lora_dropout=0.1)

Bias Training

Whether to train the bias terms along with LoRA.

# None (default, recommended)
config = LoraConfig(bias="none")

# All biases
config = LoraConfig(bias="all")

# Only LoRA biases
config = LoraConfig(bias="lora_only")

Recommendation: Keep bias="none" unless you have a specific reason to train biases.

Complete Configuration Examples

from peft import LoraConfig

config = LoraConfig(
    r=16,
    lora_alpha=16,
    target_modules="all-linear",
    lora_dropout=0.0,
    bias="none",
    task_type="CAUSAL_LM"
)

Memory Constrained

config = LoraConfig(
    r=8,
    lora_alpha=16,
    target_modules=["q_proj", "v_proj"],  # Fewer modules
    lora_dropout=0.0,
    bias="none",
    task_type="CAUSAL_LM"
)

Maximum Capacity

config = LoraConfig(
    r=64,
    lora_alpha=128,
    target_modules="all-linear",
    lora_dropout=0.05,
    bias="none",
    task_type="CAUSAL_LM"
)

Applying LoRA to a Model

from transformers import AutoModelForCausalLM
from peft import get_peft_model, LoraConfig

# Load base model
model = AutoModelForCausalLM.from_pretrained("meta-llama/Llama-3.2-3B-Instruct")

# Create LoRA config
config = LoraConfig(
    r=16,
    lora_alpha=16,
    target_modules="all-linear",
    lora_dropout=0.0,
    bias="none",
    task_type="CAUSAL_LM"
)

# Apply LoRA
model = get_peft_model(model, config)

# Check trainable parameters
model.print_trainable_parameters()
# Output: trainable params: 41,943,040 || all params: 3,255,044,096 || trainable%: 1.29%

Tuning Tips

Issue Solution
Underfitting Increase rank, add more target modules
Overfitting Decrease rank, add dropout
Memory issues Decrease rank, fewer target modules
Slow training Lower rank, use QLoRA

Next, we'll add 4-bit quantization with QLoRA to dramatically reduce memory requirements. :::

Quiz

Module 3: LoRA & QLoRA in Practice

Take Quiz