When base models don't quite fit your needs, fine-tuning is the answer. But which approach should you use? Let's break down LoRA (Low-Rank Adaptation) vs full fine-tuning.
Less Resource-Intensive More Resource-Intensive
│ │
▼ ▼
┌─────────────┐ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐
│ Prompt │ │ LoRA │ │ QLoRA │ │ Full │
│ Engineering │ │ │ │ (4-bit) │ │ Fine-Tune │
└─────────────┘ └─────────────┘ └─────────────┘ └─────────────┘
Free Low Cost Medium Cost High Cost
No Training Minutes-Hours Hours-Days Days-Weeks
LoRA (Low-Rank Adaptation) freezes the original model weights and injects trainable low-rank matrices into each layer. Instead of updating all 7B+ parameters, you only train a small fraction.
Original Model Weights (Frozen)
│
▼
┌─────────┐
│ W₀ │ (frozen, say 4096 x 4096)
└─────────┘
│
├──────────────────────┐
│ │
▼ ▼
┌─────────┐ ┌─────────┐
│ A │ │ B │
│ (4096×r)│ │ (r×4096)│
└─────────┘ └─────────┘
│ │
└──────────┬───────────┘
│
▼
W = W₀ + BA
Where r (rank) is typically 8-64, making trainable parameters < 1% of total.
lora_config = LoraConfig(
r=16, # Rank (8-64 typically)
lora_alpha=32, # Scaling factor (usually 2x rank)
lora_dropout=0.1, # Dropout for regularization
bias="none", # Don't train biases
task_type="CAUSAL_LM", # For text generation
target_modules=[ # Which layers to adapt
"q_proj",
"k_proj",
"v_proj",
"o_proj",
"gate_proj",
"up_proj",
"down_proj"
]
)
training_args = TrainingArguments(
output_dir="./results",
num_train_epochs=3,
per_device_train_batch_size=4,
gradient_accumulation_steps=4,
learning_rate=2e-4,
lr_scheduler_type="cosine",
warmup_ratio=0.1,
logging_steps=10,
save_steps=100,
fp16=True,
)
QLoRA combines LoRA with 4-bit quantization, making it possible to fine-tune large models on consumer hardware.
bnb_config = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_quant_type="nf4",
bnb_4bit_compute_dtype=torch.bfloat16,
bnb_4bit_use_double_quant=True,
)
model = AutoModelForCausalLM.from_pretrained(
"meta-llama/Llama-2-7b-hf",
quantization_config=bnb_config,
device_map="auto",
)
| Method | VRAM Required | Time (7B model) | Cost |
|---|---|---|---|
| LoRA | 16-24GB | Hours | $ |
| QLoRA | 8-16GB | Hours | $ |
| Full Fine-Tune | 80GB+ | Days | $$$$ |
For fine-tuning, quality matters more than quantity:
Our marketplace offers ready-to-use fine-tuning configurations:
Save weeks of experimentation with battle-tested recipes from the community.