π₯ Qwen/Qwen3-32B
High-performance 32B parameter LLM. Excellent for reasoning, coding, and multilingual tasks.
Input: $0.15/M tokensOutput: $0.44/M tokens
33.54M runs in 7 days
π§ DeepSeek-V3-0324-TEE
Advanced reasoning model with Trusted Execution Environment for secure inference.
Input: $0.35/M tokensOutput: $1.61/M tokens
7.63M runs in 7 days
β‘ Mistral-Small-3.1-24B
Efficient 24B instruction-tuned model. Great balance of speed and quality.
Input: $0.06/M tokensOutput: $0.20/M tokens
3.5M runs in 7 days
π¨ FLUX Image Generation
State-of-the-art image generation. Create stunning visuals from text prompts.
$0.003 per image
High quality, fast generation
Model Categories
π¬ LLM (Large Language Models)π¨ Image Generationπ¬ Video Generationπ’ Embeddingsπ€ Text to Speechπ΅ Music Generation
Why Choose VoltageGPU for AI Inference?
- β85% cheaper than OpenAI - Competitive per-token pricing
- βOpenAI-compatible API - Drop-in replacement, no code changes
- β144+ models available - Latest open-source and proprietary models
- βServerless - No infrastructure to manage, pay per use
- βTEE security - Trusted Execution Environment for sensitive data
Quick Start - API Integration
from openai import OpenAI
client = OpenAI(
base_url="https://api.voltagegpu.com/v1",
api_key="your-voltagegpu-api-key"
)
response = client.chat.completions.create(
model="Qwen/Qwen3-32B",
messages=[{"role": "user", "content": "Hello!"}]
)
print(response.choices[0].message.content)