🎯

Fine-Tune LLMs on Cloud GPUs

Fine-tune DeepSeek, Llama, Mistral, and other LLMs with LoRA and QLoRA on affordable cloud GPUs. From budget RTX 4090 to enterprise H100.

Fine-tuning lets you customize pre-trained language models for your specific domain, tone, or task without training from scratch. VoltageGPU makes fine-tuning accessible and affordable with GPUs starting at $0.25/h. Use parameter-efficient methods like LoRA and QLoRA to fine-tune 70B+ parameter models on a single GPU, or scale up to multi-GPU setups for full fine-tuning of the largest open-source models.

Key Benefits

💸

Budget-Friendly

Fine-tune on RTX 4090 at $0.25/h. LoRA fine-tuning of a 7B model costs under $5 total.

🔬

LoRA & QLoRA Support

Use parameter-efficient fine-tuning to customize 70B+ models on a single GPU with 4-bit quantization.

🤖

All Major Models

Fine-tune DeepSeek, Llama, Mistral, Qwen, Mixtral, and any Hugging Face model out of the box.

📦

Persistent Storage

Your checkpoints and datasets persist across sessions. Resume training anytime without re-uploading.

Pre-configured Environment

Unsloth, Axolotl, and Hugging Face TRL come pre-installed for one-command fine-tuning.

🚀

Export & Deploy

Export your fine-tuned model to GGUF, AWQ, or GPTQ format and deploy it as a serverless API on VoltageGPU.

Recommended GPUs

Recommended Models

Code Example

Python
from unsloth import FastLanguageModel
from trl import SFTTrainer
from datasets import load_dataset

# Load base model with 4-bit quantization
model, tokenizer = FastLanguageModel.from_pretrained(
    model_name="meta-llama/Llama-3.1-8B-Instruct",
    max_seq_length=4096,
    load_in_4bit=True,  # QLoRA
)

# Add LoRA adapters
model = FastLanguageModel.get_peft_model(
    model,
    r=16,
    lora_alpha=32,
    target_modules=["q_proj", "k_proj", "v_proj",
                     "o_proj", "gate_proj",
                     "up_proj", "down_proj"],
    lora_dropout=0.05,
)

# Load your custom dataset
dataset = load_dataset("json", data_files="training_data.jsonl")

# Fine-tune with SFTTrainer
trainer = SFTTrainer(
    model=model,
    train_dataset=dataset["train"],
    tokenizer=tokenizer,
    max_seq_length=4096,
    args=TrainingArguments(
        output_dir="./output",
        num_train_epochs=3,
        per_device_train_batch_size=4,
        learning_rate=2e-4,
        bf16=True,
    ),
)

trainer.train()
model.save_pretrained("./fine-tuned-llama-3.1")

Frequently Asked Questions

What is the difference between LoRA, QLoRA, and full fine-tuning?
Full fine-tuning updates all model parameters and requires the most VRAM (e.g., 160GB+ for a 70B model). LoRA (Low-Rank Adaptation) freezes the base model and trains small adapter matrices, reducing VRAM to ~40GB for a 70B model. QLoRA adds 4-bit quantization on top of LoRA, allowing you to fine-tune a 70B model on a single 24GB GPU like the RTX 4090.
Which GPU should I choose for fine-tuning?
For 7-8B models with QLoRA: RTX 4090 ($0.25/h) is the best value. For 13-34B models or full LoRA on 7B: A100 80GB ($1.10/h). For 70B+ models or full fine-tuning: H100 SXM ($2.49/h) or multi-GPU setups.
How long does fine-tuning take?
It depends on model size, dataset size, and GPU. Typical examples: Llama 3.1 8B with QLoRA on 10K examples takes ~30 minutes on an RTX 4090 (~$0.12). Llama 3.3 70B with QLoRA on 50K examples takes ~4 hours on an H100 (~$10).
Can I deploy my fine-tuned model on VoltageGPU?
Yes. After fine-tuning, you can merge the LoRA adapters, convert to GGUF or AWQ format, and deploy it as a serverless API endpoint on VoltageGPU with an OpenAI-compatible interface.
What datasets can I use for fine-tuning?
Any dataset in JSONL, CSV, or Hugging Face Datasets format. Common formats include instruction-following (Alpaca format), chat (ShareGPT format), and completion pairs. You can also create custom datasets from your own documents.

Explore Other Use Cases

Start Building Now

Deploy a GPU pod in under 60 seconds. $5 free credits, no credit card required.

Browse Available GPUs →Explore Models