LoRA & QLoRA in Practice
LoRA Configuration
Understanding LoRA parameters is crucial for successful fine-tuning. Let's explore each parameter and its impact.
Core LoRA Parameters
The LoraConfig Object
from peft import LoraConfig, TaskType
config = LoraConfig(
r=16, # Rank
lora_alpha=32, # Alpha scaling
target_modules="all-linear", # Which layers
lora_dropout=0.05, # Dropout
bias="none", # Bias training
task_type=TaskType.CAUSAL_LM # Task type
)
Rank (r)
The rank determines the size of the LoRA matrices and directly affects capacity.
Original weight: W ∈ R^(d×k)
LoRA adds: A ∈ R^(d×r), B ∈ R^(r×k)
Output: W' = W + (A × B)
Trainable params = 2 × r × (d + k)
Rank Selection Guide
| Rank | Parameters | Memory | Use Case |
|---|---|---|---|
| 4 | Very few | Minimal | Simple style changes |
| 8 | Few | Low | Basic tasks |
| 16 | Moderate | Medium | Default choice |
| 32 | Many | Higher | Complex tasks |
| 64+ | Very many | High | Maximum capacity |
# Low rank for simple tasks
simple_config = LoraConfig(r=8, ...)
# Higher rank for complex domain knowledge
complex_config = LoraConfig(r=32, ...)
Alpha (lora_alpha)
Alpha is a scaling factor that controls how much the LoRA update affects the output.
Scaling = alpha / r
Output = W + (alpha/r) × (A × B)
Common Patterns
# Pattern 1: Alpha = Rank (scaling = 1)
config = LoraConfig(r=16, lora_alpha=16)
# Pattern 2: Alpha = 2×Rank (scaling = 2)
config = LoraConfig(r=16, lora_alpha=32)
# Pattern 3: Fixed alpha (adjust with rank)
config = LoraConfig(r=32, lora_alpha=16) # scaling = 0.5
Rule of thumb: Start with alpha = rank (scaling = 1), then adjust based on training stability.
Target Modules
Which layers to add LoRA adapters to. This significantly affects both capacity and memory.
Common Options
# All linear layers (recommended for 2025)
config = LoraConfig(target_modules="all-linear")
# Attention only (traditional approach)
config = LoraConfig(target_modules=["q_proj", "k_proj", "v_proj", "o_proj"])
# Attention + MLP
config = LoraConfig(target_modules=[
"q_proj", "k_proj", "v_proj", "o_proj",
"gate_proj", "up_proj", "down_proj"
])
Finding Target Modules
from transformers import AutoModelForCausalLM
model = AutoModelForCausalLM.from_pretrained("meta-llama/Llama-3.2-3B-Instruct")
# Print all linear layer names
for name, module in model.named_modules():
if isinstance(module, torch.nn.Linear):
print(name)
Output for Llama:
model.embed_tokens
model.layers.0.self_attn.q_proj
model.layers.0.self_attn.k_proj
model.layers.0.self_attn.v_proj
model.layers.0.self_attn.o_proj
model.layers.0.mlp.gate_proj
model.layers.0.mlp.up_proj
model.layers.0.mlp.down_proj
...
Dropout
Regularization to prevent overfitting.
# No dropout (for larger datasets)
config = LoraConfig(lora_dropout=0.0)
# Light dropout (for smaller datasets)
config = LoraConfig(lora_dropout=0.05)
# Higher dropout (for very small datasets or overfitting)
config = LoraConfig(lora_dropout=0.1)
Bias Training
Whether to train the bias terms along with LoRA.
# None (default, recommended)
config = LoraConfig(bias="none")
# All biases
config = LoraConfig(bias="all")
# Only LoRA biases
config = LoraConfig(bias="lora_only")
Recommendation: Keep bias="none" unless you have a specific reason to train biases.
Complete Configuration Examples
General Purpose (Recommended Default)
from peft import LoraConfig
config = LoraConfig(
r=16,
lora_alpha=16,
target_modules="all-linear",
lora_dropout=0.0,
bias="none",
task_type="CAUSAL_LM"
)
Memory Constrained
config = LoraConfig(
r=8,
lora_alpha=16,
target_modules=["q_proj", "v_proj"], # Fewer modules
lora_dropout=0.0,
bias="none",
task_type="CAUSAL_LM"
)
Maximum Capacity
config = LoraConfig(
r=64,
lora_alpha=128,
target_modules="all-linear",
lora_dropout=0.05,
bias="none",
task_type="CAUSAL_LM"
)
Applying LoRA to a Model
from transformers import AutoModelForCausalLM
from peft import get_peft_model, LoraConfig
# Load base model
model = AutoModelForCausalLM.from_pretrained("meta-llama/Llama-3.2-3B-Instruct")
# Create LoRA config
config = LoraConfig(
r=16,
lora_alpha=16,
target_modules="all-linear",
lora_dropout=0.0,
bias="none",
task_type="CAUSAL_LM"
)
# Apply LoRA
model = get_peft_model(model, config)
# Check trainable parameters
model.print_trainable_parameters()
# Output: trainable params: 41,943,040 || all params: 3,255,044,096 || trainable%: 1.29%
Tuning Tips
| Issue | Solution |
|---|---|
| Underfitting | Increase rank, add more target modules |
| Overfitting | Decrease rank, add dropout |
| Memory issues | Decrease rank, fewer target modules |
| Slow training | Lower rank, use QLoRA |
Next, we'll add 4-bit quantization with QLoRA to dramatically reduce memory requirements. :::