LoRA & QLoRA in Practice
Setting Up the Environment
3 min read
Before we start fine-tuning, let's set up a proper environment with all the required libraries.
Required Libraries
Here are the core libraries for fine-tuning in 2025:
| Library | Purpose | Version |
|---|---|---|
| transformers | Model loading and tokenization | ≥4.46.0 |
| peft | LoRA and adapter methods | ≥0.13.0 |
| trl | Training (SFTTrainer, DPOTrainer) | ≥0.12.0 |
| bitsandbytes | 4-bit quantization | ≥0.44.0 |
| datasets | Dataset loading and processing | ≥3.0.0 |
| accelerate | Distributed training | ≥1.0.0 |
Installation
Basic Installation
pip install transformers peft trl datasets accelerate bitsandbytes
With CUDA Support (Recommended)
pip install torch --index-url https://download.pytorch.org/whl/cu121
pip install transformers peft trl datasets accelerate bitsandbytes
Full Installation with Extras
pip install transformers[torch] peft trl datasets accelerate bitsandbytes
pip install wandb # For experiment tracking
pip install flash-attn --no-build-isolation # Optional: faster attention
Verify Installation
Run this script to verify everything is working:
import torch
print(f"PyTorch version: {torch.__version__}")
print(f"CUDA available: {torch.cuda.is_available()}")
if torch.cuda.is_available():
print(f"CUDA version: {torch.version.cuda}")
print(f"GPU: {torch.cuda.get_device_name(0)}")
print(f"VRAM: {torch.cuda.get_device_properties(0).total_memory / 1e9:.1f} GB")
import transformers
print(f"Transformers version: {transformers.__version__}")
import peft
print(f"PEFT version: {peft.__version__}")
import trl
print(f"TRL version: {trl.__version__}")
import bitsandbytes
print(f"bitsandbytes version: {bitsandbytes.__version__}")
Expected output:
PyTorch version: 2.5.1+cu121
CUDA available: True
CUDA version: 12.1
GPU: NVIDIA GeForce RTX 4090
VRAM: 24.0 GB
Transformers version: 4.46.0
PEFT version: 0.13.0
TRL version: 0.12.0
bitsandbytes version: 0.44.0
Environment Options
Local GPU
Pros: Full control, no time limits Cons: Hardware investment required
Minimum specs for QLoRA:
- GPU: 8GB VRAM (RTX 3070 or better)
- RAM: 32GB
- Storage: 100GB SSD
Google Colab
Pros: Free tier available, easy to start Cons: Session time limits, variable GPU availability
# Check Colab GPU
!nvidia-smi
Cloud Providers
| Provider | GPU Options | Cost |
|---|---|---|
| RunPod | A100, 4090, A6000 | $0.40-2.00/hr |
| Lambda Labs | A100, H100 | $1.10-3.00/hr |
| Vast.ai | Various | $0.20-1.50/hr |
| AWS SageMaker | P4d, G5 | $1.50-5.00/hr |
Project Structure
Organize your fine-tuning project:
my-fine-tuning-project/
├── data/
│ ├── train.json
│ └── validation.json
├── configs/
│ └── lora_config.yaml
├── scripts/
│ ├── train.py
│ └── evaluate.py
├── outputs/
│ ├── checkpoints/
│ └── final_model/
├── requirements.txt
└── README.md
Configuration File
Create a config file for reproducibility:
# configs/lora_config.yaml
model:
name: "meta-llama/Llama-3.2-3B-Instruct"
max_seq_length: 2048
lora:
r: 16
alpha: 16
dropout: 0.0
target_modules: "all-linear"
training:
batch_size: 4
gradient_accumulation_steps: 4
learning_rate: 2e-4
num_epochs: 3
warmup_ratio: 0.03
quantization:
load_in_4bit: true
bnb_4bit_quant_type: "nf4"
bnb_4bit_compute_dtype: "bfloat16"
Hugging Face Setup
Most models require authentication:
# Install CLI
pip install huggingface_hub
# Login (get token from huggingface.co/settings/tokens)
huggingface-cli login
Or in Python:
from huggingface_hub import login
login(token="your_token_here")
Common Setup Issues
CUDA Out of Memory
# Reduce batch size
batch_size = 2
# Enable gradient checkpointing
model.gradient_checkpointing_enable()
# Use smaller max_seq_length
max_seq_length = 1024
bitsandbytes Issues on Windows
# Use pre-built Windows wheels
pip install bitsandbytes-windows
Flash Attention Not Available
# Fall back to standard attention
model = AutoModelForCausalLM.from_pretrained(
model_name,
attn_implementation="eager" # or "sdpa"
)
Tip: Start with a small model (1B-3B) to verify your setup before moving to larger models.
Next, we'll dive into LoRA configuration and understand each parameter. :::