🔥 Qwen/Qwen3-32B
High-performance 32B parameter LLM. Excellent for reasoning, coding, and multilingual tasks.
33.54M runs in 7 days
Access 144+ state-of-the-art AI models via API. Serverless inference with 85% cost savings vs OpenAI. OpenAI-compatible API for seamless integration.
High-performance 32B parameter LLM. Excellent for reasoning, coding, and multilingual tasks.
33.54M runs in 7 days
Advanced reasoning model with Trusted Execution Environment for secure inference.
7.63M runs in 7 days
Efficient 24B instruction-tuned model. Great balance of speed and quality.
3.5M runs in 7 days
State-of-the-art image generation. Create stunning visuals from text prompts.
High quality, fast generation
from openai import OpenAI
client = OpenAI(
base_url="https://api.voltagegpu.com/v1",
api_key="your-voltagegpu-api-key"
)
response = client.chat.completions.create(
model="Qwen/Qwen3-32B",
messages=[{"role": "user", "content": "Hello!"}]
)
print(response.choices[0].message.content)Access 140+ state-of-the-art AI models through VoltageGPU's serverless inference API. Pay only for what you use with competitive per-token pricing. OpenAI-compatible API for seamless integration.
Qwen3 32B parameter model - excellent for reasoning, coding, and multilingual tasks. Hot model with 33M+ runs. Input: $0.15/M tokens, Output: $0.44/M tokens.
DeepSeek V3 with Trusted Execution Environment. Advanced reasoning capabilities. Input: $0.35/M tokens, Output: $1.61/M tokens. 7.6M+ runs.
DeepSeek R1 Chimera variant - optimized for complex reasoning tasks. Input: $0.56/M tokens, Output: $2.22/M tokens. 2.4M+ runs.
Mistral Small 24B instruction-tuned model. Great balance of speed and quality. Input: $0.06/M tokens, Output: $0.20/M tokens. 3.5M+ runs.
Qwen3 235B parameter flagship model with TEE security. Top-tier performance. Input: $0.56/M tokens, Output: $2.22/M tokens.
Google Gemma 3 4B instruction-tuned. Lightweight and fast for simple tasks. Input: $0.02/M tokens, Output: $0.06/M tokens. 1.6M+ runs.
GLM 4.7 with Trusted Execution Environment. Chinese and English bilingual. Input: $0.74/M tokens, Output: $2.78/M tokens.
NousResearch Hermes 4 70B - excellent for function calling and tool use. Input: $0.20/M tokens, Output: $0.70/M tokens. 1.1M+ runs.
VoltageGPU provides an OpenAI-compatible API. Simply change your base URL and API key to start using our models:
from openai import OpenAI
client = OpenAI(
base_url="https://api.voltagegpu.com/v1",
api_key="your-voltagegpu-api-key"
)
response = client.chat.completions.create(
model="Qwen/Qwen3-32B",
messages=[{"role": "user", "content": "Hello!"}]
)Pricing varies by model. Small models like Gemma-3-4B start at $0.02/M input tokens. Large models like Qwen3-235B cost $0.56/M input tokens. All pricing is 85% cheaper than equivalent OpenAI models.
Yes! VoltageGPU provides a fully OpenAI-compatible API. You can use the official OpenAI Python/Node.js SDKs by simply changing the base_url to api.voltagegpu.com.
VoltageGPU offers 140+ models including Qwen3, DeepSeek, Mistral, Llama, Gemma, FLUX for images, and many more. New models are added regularly.
Sign up for a free account, generate an API key from your dashboard, and start making API calls. No credit card required to start. Pay only for what you use.
Loading AI models catalog...