Lesson 9 of 24

LoRA & QLoRA in Practice

Setting Up the Environment

3 min read

Before we start fine-tuning, let's set up a proper environment with all the required libraries.

Required Libraries

Here are the core libraries for fine-tuning in 2025:

Library Purpose Version
transformers Model loading and tokenization ≥4.46.0
peft LoRA and adapter methods ≥0.13.0
trl Training (SFTTrainer, DPOTrainer) ≥0.12.0
bitsandbytes 4-bit quantization ≥0.44.0
datasets Dataset loading and processing ≥3.0.0
accelerate Distributed training ≥1.0.0

Installation

Basic Installation

pip install transformers peft trl datasets accelerate bitsandbytes
pip install torch --index-url https://download.pytorch.org/whl/cu121
pip install transformers peft trl datasets accelerate bitsandbytes

Full Installation with Extras

pip install transformers[torch] peft trl datasets accelerate bitsandbytes
pip install wandb  # For experiment tracking
pip install flash-attn --no-build-isolation  # Optional: faster attention

Verify Installation

Run this script to verify everything is working:

import torch
print(f"PyTorch version: {torch.__version__}")
print(f"CUDA available: {torch.cuda.is_available()}")
if torch.cuda.is_available():
    print(f"CUDA version: {torch.version.cuda}")
    print(f"GPU: {torch.cuda.get_device_name(0)}")
    print(f"VRAM: {torch.cuda.get_device_properties(0).total_memory / 1e9:.1f} GB")

import transformers
print(f"Transformers version: {transformers.__version__}")

import peft
print(f"PEFT version: {peft.__version__}")

import trl
print(f"TRL version: {trl.__version__}")

import bitsandbytes
print(f"bitsandbytes version: {bitsandbytes.__version__}")

Expected output:

PyTorch version: 2.5.1+cu121
CUDA available: True
CUDA version: 12.1
GPU: NVIDIA GeForce RTX 4090
VRAM: 24.0 GB
Transformers version: 4.46.0
PEFT version: 0.13.0
TRL version: 0.12.0
bitsandbytes version: 0.44.0

Environment Options

Local GPU

Pros: Full control, no time limits Cons: Hardware investment required

Minimum specs for QLoRA:

  • GPU: 8GB VRAM (RTX 3070 or better)
  • RAM: 32GB
  • Storage: 100GB SSD

Google Colab

Pros: Free tier available, easy to start Cons: Session time limits, variable GPU availability

# Check Colab GPU
!nvidia-smi

Cloud Providers

Provider GPU Options Cost
RunPod A100, 4090, A6000 $0.40-2.00/hr
Lambda Labs A100, H100 $1.10-3.00/hr
Vast.ai Various $0.20-1.50/hr
AWS SageMaker P4d, G5 $1.50-5.00/hr

Project Structure

Organize your fine-tuning project:

my-fine-tuning-project/
├── data/
│   ├── train.json
│   └── validation.json
├── configs/
│   └── lora_config.yaml
├── scripts/
│   ├── train.py
│   └── evaluate.py
├── outputs/
│   ├── checkpoints/
│   └── final_model/
├── requirements.txt
└── README.md

Configuration File

Create a config file for reproducibility:

# configs/lora_config.yaml
model:
  name: "meta-llama/Llama-3.2-3B-Instruct"
  max_seq_length: 2048

lora:
  r: 16
  alpha: 16
  dropout: 0.0
  target_modules: "all-linear"

training:
  batch_size: 4
  gradient_accumulation_steps: 4
  learning_rate: 2e-4
  num_epochs: 3
  warmup_ratio: 0.03

quantization:
  load_in_4bit: true
  bnb_4bit_quant_type: "nf4"
  bnb_4bit_compute_dtype: "bfloat16"

Hugging Face Setup

Most models require authentication:

# Install CLI
pip install huggingface_hub

# Login (get token from huggingface.co/settings/tokens)
huggingface-cli login

Or in Python:

from huggingface_hub import login
login(token="your_token_here")

Common Setup Issues

CUDA Out of Memory

# Reduce batch size
batch_size = 2

# Enable gradient checkpointing
model.gradient_checkpointing_enable()

# Use smaller max_seq_length
max_seq_length = 1024

bitsandbytes Issues on Windows

# Use pre-built Windows wheels
pip install bitsandbytes-windows

Flash Attention Not Available

# Fall back to standard attention
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    attn_implementation="eager"  # or "sdpa"
)

Tip: Start with a small model (1B-3B) to verify your setup before moving to larger models.

Next, we'll dive into LoRA configuration and understand each parameter. :::

Quiz

Module 3: LoRA & QLoRA in Practice

Take Quiz