Choosing Your Approach

Now that you understand the different fine-tuning methods, let's build a decision framework to help you choose the right approach for your specific situation.

The Decision Tree

START: Do you need to customize a model?
│
├─ NO → Use the base model with good prompting
│
└─ YES → How much VRAM do you have?
         │
         ├─ <8GB → Use API fine-tuning (OpenAI, Together AI)
         │
         ├─ 8-16GB → QLoRA on 7-8B models
         │
         ├─ 16-24GB → QLoRA on 7-70B models
         │
         └─ 24GB+ → LoRA or QLoRA on any model

Key Questions to Ask

1. What's Your Hardware?

VRAM	Recommended Approach	Max Model Size
8GB	QLoRA + Unsloth	7B
16GB	QLoRA + Unsloth	13B
24GB	QLoRA or LoRA	70B (QLoRA)
48GB+	LoRA or Full FT	70B (LoRA)

2. What's Your Goal?

Goal	Recommended Method	Why
Teach domain knowledge	SFT	Trains on instruction-response pairs
Improve response quality	SFT + DPO	DPO aligns outputs to preferences
Change output format	SFT	Easy to learn structured outputs
Reduce harmful outputs	DPO	Preference learning is ideal
Maximum customization	Full fine-tuning	Updates all parameters

3. How Much Data Do You Have?

Dataset Size	Recommendation
<100 examples	Use few-shot prompting instead
100-1,000	SFT with LoRA, watch for overfitting
1,000-10,000	SFT with LoRA, optimal range
10,000+	SFT + DPO, or consider full fine-tuning

Practical Recommendations

For Most Users: QLoRA + SFT

# The "safe default" configuration
from peft import LoraConfig
from transformers import BitsAndBytesConfig

# 4-bit quantization
bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.bfloat16
)

# LoRA config
lora_config = LoraConfig(
    r=16,
    lora_alpha=16,
    target_modules="all-linear",
    lora_dropout=0.0
)

Why this works:

Fits on consumer GPUs (8-24GB)
Excellent quality for most tasks
Fast training (hours, not days)
Easy to iterate and experiment

For Speed: Unsloth

from unsloth import FastLanguageModel

model, tokenizer = FastLanguageModel.from_pretrained(
    model_name="unsloth/Llama-3.2-3B-Instruct",
    max_seq_length=2048,
    load_in_4bit=True
)

Why this works:

2x faster than standard training
70% less VRAM usage
Same quality results

For Quality: SFT + DPO Pipeline

Step 1: SFT (Supervised Fine-Tuning)
        Train on instruction-response pairs
        ↓
Step 2: DPO (Direct Preference Optimization)
        Train on preference pairs (chosen/rejected)
        ↓
Result: Model with both skills AND alignment

Common Mistakes to Avoid

1. Skipping Data Quality

Wrong: "I'll just throw 50,000 examples at it" Right: 1,000 high-quality examples beat 50,000 noisy ones

2. Overfitting on Small Datasets

Wrong: Training for 10 epochs on 500 examples Right: 1-3 epochs, use validation set, monitor loss

3. Wrong Model Size

Wrong: Fine-tuning 70B when 7B would suffice Right: Start small, scale up only if needed

4. Ignoring Base Model Choice

Wrong: Fine-tuning any random model Right: Choose a base model already good at your task type

Quick Start Recommendations

Your Situation	Do This
First time fine-tuning	QLoRA + Unsloth on Llama 3.2 3B
Need production quality	QLoRA on Llama 3.2 8B or Mistral 7B
Limited VRAM	QLoRA + Unsloth, gradient checkpointing
Need best possible quality	LoRA on 70B or full fine-tune

Summary Checklist

Before you start fine-tuning:

Defined clear success metrics
Collected 500+ high-quality training examples
Chosen base model appropriate for task
Verified hardware can handle chosen approach
Prepared validation set (10-20% of data)
Set up experiment tracking (wandb, mlflow)

In the next module, we'll dive deep into preparing your training dataset—the most critical factor for fine-tuning success. :::