Lesson 10 of 24

LoRA & QLoRA in Practice

LoRA Configuration

3 min read

Understanding LoRA parameters is crucial for successful fine-tuning. Let's explore each parameter and its impact.

Core LoRA Parameters

The LoraConfig Object

from peft import LoraConfig, TaskType

config = LoraConfig(
    r=16,                           # Rank
    lora_alpha=32,                  # Alpha scaling
    target_modules="all-linear",    # Which layers
    lora_dropout=0.05,              # Dropout
    bias="none",                    # Bias training
    task_type=TaskType.CAUSAL_LM    # Task type
)

Rank (r)

The rank determines the size of the LoRA matrices and directly affects capacity.

Original weight: W ∈ R^(d×k)
LoRA adds: A ∈ R^(d×r), B ∈ R^(r×k)
Output: W' = W + (A × B)

Trainable params = 2 × r × (d + k)

Rank Selection Guide

RankParametersMemoryUse Case
4Very fewMinimalSimple style changes
8FewLowBasic tasks
16ModerateMediumDefault choice
32ManyHigherComplex tasks
64+Very manyHighMaximum capacity
# Low rank for simple tasks
simple_config = LoraConfig(r=8, ...)

# Higher rank for complex domain knowledge
complex_config = LoraConfig(r=32, ...)

Alpha (lora_alpha)

Alpha is a scaling factor that controls how much the LoRA update affects the output.

Scaling = alpha / r
Output = W + (alpha/r) × (A × B)

Common Patterns

# Pattern 1: Alpha = Rank (scaling = 1)
config = LoraConfig(r=16, lora_alpha=16)

# Pattern 2: Alpha = 2×Rank (scaling = 2)
config = LoraConfig(r=16, lora_alpha=32)

# Pattern 3: Fixed alpha (adjust with rank)
config = LoraConfig(r=32, lora_alpha=16)  # scaling = 0.5

Rule of thumb: Start with alpha = rank (scaling = 1), then adjust based on training stability.

Target Modules

Which layers to add LoRA adapters to. This significantly affects both capacity and memory.

Common Options

# All linear layers (recommended for 2026)
config = LoraConfig(target_modules="all-linear")

# Attention only (traditional approach)
config = LoraConfig(target_modules=["q_proj", "k_proj", "v_proj", "o_proj"])

# Attention + MLP
config = LoraConfig(target_modules=[
    "q_proj", "k_proj", "v_proj", "o_proj",
    "gate_proj", "up_proj", "down_proj"
])

Finding Target Modules

from transformers import AutoModelForCausalLM

model = AutoModelForCausalLM.from_pretrained("meta-llama/Llama-3.2-3B-Instruct")

# Print all linear layer names
for name, module in model.named_modules():
    if isinstance(module, torch.nn.Linear):
        print(name)

Output for Llama:

model.embed_tokens
model.layers.0.self_attn.q_proj
model.layers.0.self_attn.k_proj
model.layers.0.self_attn.v_proj
model.layers.0.self_attn.o_proj
model.layers.0.mlp.gate_proj
model.layers.0.mlp.up_proj
model.layers.0.mlp.down_proj
...

Dropout

Regularization to prevent overfitting.

# No dropout (for larger datasets)
config = LoraConfig(lora_dropout=0.0)

# Light dropout (for smaller datasets)
config = LoraConfig(lora_dropout=0.05)

# Higher dropout (for very small datasets or overfitting)
config = LoraConfig(lora_dropout=0.1)

Bias Training

Whether to train the bias terms along with LoRA.

# None (default, recommended)
config = LoraConfig(bias="none")

# All biases
config = LoraConfig(bias="all")

# Only LoRA biases
config = LoraConfig(bias="lora_only")

Recommendation: Keep bias="none" unless you have a specific reason to train biases.

Complete Configuration Examples

from peft import LoraConfig

config = LoraConfig(
    r=16,
    lora_alpha=16,
    target_modules="all-linear",
    lora_dropout=0.0,
    bias="none",
    task_type="CAUSAL_LM"
)

Memory Constrained

config = LoraConfig(
    r=8,
    lora_alpha=16,
    target_modules=["q_proj", "v_proj"],  # Fewer modules
    lora_dropout=0.0,
    bias="none",
    task_type="CAUSAL_LM"
)

Maximum Capacity

config = LoraConfig(
    r=64,
    lora_alpha=128,
    target_modules="all-linear",
    lora_dropout=0.05,
    bias="none",
    task_type="CAUSAL_LM"
)

Applying LoRA to a Model

from transformers import AutoModelForCausalLM
from peft import get_peft_model, LoraConfig

# Load base model
model = AutoModelForCausalLM.from_pretrained("meta-llama/Llama-3.2-3B-Instruct")

# Create LoRA config
config = LoraConfig(
    r=16,
    lora_alpha=16,
    target_modules="all-linear",
    lora_dropout=0.0,
    bias="none",
    task_type="CAUSAL_LM"
)

# Apply LoRA
model = get_peft_model(model, config)

# Check trainable parameters
model.print_trainable_parameters()
# Output: trainable params: 41,943,040 || all params: 3,255,044,096 || trainable%: 1.29%

Tuning Tips

IssueSolution
UnderfittingIncrease rank, add more target modules
OverfittingDecrease rank, add dropout
Memory issuesDecrease rank, fewer target modules
Slow trainingLower rank, use QLoRA

Next, we'll add 4-bit quantization with QLoRA to dramatically reduce memory requirements. :::

Quick check: how does this lesson land for you?

Quiz

Module 3: LoRA & QLoRA in Practice

Take Quiz
FREE WEEKLY NEWSLETTER

Stay on the Nerd Track

One email per week — courses, deep dives, tools, and AI experiments.

No spam. Unsubscribe anytime.