AI Models Catalog - VoltageGPU Serverless Inference

Available AI Models for Inference

Access 140+ state-of-the-art AI models through VoltageGPU's serverless inference API. Pay only for what you use with competitive per-token pricing. OpenAI-compatible API for seamless integration.

  • Qwen/Qwen3-32B

    Qwen3 32B parameter model - excellent for reasoning, coding, and multilingual tasks. Hot model with 33M+ runs. Input: $0.15/M tokens, Output: $0.44/M tokens.

  • DeepSeek-V3-0324-TEE

    DeepSeek V3 with Trusted Execution Environment. Advanced reasoning capabilities. Input: $0.35/M tokens, Output: $1.61/M tokens. 7.6M+ runs.

  • DeepSeek-R1T-Chimera

    DeepSeek R1 Chimera variant - optimized for complex reasoning tasks. Input: $0.56/M tokens, Output: $2.22/M tokens. 2.4M+ runs.

  • Mistral-Small-3.1-24B-Instruct

    Mistral Small 24B instruction-tuned model. Great balance of speed and quality. Input: $0.06/M tokens, Output: $0.20/M tokens. 3.5M+ runs.

  • Qwen3-235B-A22B-Instruct

    Qwen3 235B parameter flagship model with TEE security. Top-tier performance. Input: $0.56/M tokens, Output: $2.22/M tokens.

  • Gemma-3-4B-IT

    Google Gemma 3 4B instruction-tuned. Lightweight and fast for simple tasks. Input: $0.02/M tokens, Output: $0.06/M tokens. 1.6M+ runs.

  • GLM-4.7-TEE

    GLM 4.7 with Trusted Execution Environment. Chinese and English bilingual. Input: $0.74/M tokens, Output: $2.78/M tokens.

  • Hermes-4-70B

    NousResearch Hermes 4 70B - excellent for function calling and tool use. Input: $0.20/M tokens, Output: $0.70/M tokens. 1.1M+ runs.

Model Categories

  • LLM (Large Language Models) - Text generation, chat, reasoning
  • Image Generation - FLUX, Stable Diffusion, DALL-E style models
  • Embeddings - Text embeddings for RAG and semantic search
  • Vision - Image understanding and analysis
  • Audio - Speech-to-text, text-to-speech
  • Video - Video generation and processing

Why Use VoltageGPU for AI Inference?

  • 85% cheaper than OpenAI - Competitive per-token pricing
  • OpenAI-compatible API - Drop-in replacement, no code changes
  • 140+ models available - Latest open-source and proprietary models
  • Serverless - No infrastructure to manage, pay per use
  • Low latency - Global edge deployment for fast responses
  • TEE security - Trusted Execution Environment for sensitive data

API Integration Examples

VoltageGPU provides an OpenAI-compatible API. Simply change your base URL and API key to start using our models:

from openai import OpenAI

client = OpenAI(
    base_url="https://api.voltagegpu.com/v1",
    api_key="your-voltagegpu-api-key"
)

response = client.chat.completions.create(
    model="Qwen/Qwen3-32B",
    messages=[{"role": "user", "content": "Hello!"}]
)

Popular Use Cases

  • Chatbots and conversational AI
  • Code generation and assistance
  • Content creation and copywriting
  • Document summarization and analysis
  • Translation and multilingual support
  • RAG (Retrieval Augmented Generation) systems
  • AI agents and automation
  • Image generation for marketing and design

Frequently Asked Questions

How much does AI inference cost on VoltageGPU?

Pricing varies by model. Small models like Gemma-3-4B start at $0.02/M input tokens. Large models like Qwen3-235B cost $0.56/M input tokens. All pricing is 85% cheaper than equivalent OpenAI models.

Is VoltageGPU API compatible with OpenAI?

Yes! VoltageGPU provides a fully OpenAI-compatible API. You can use the official OpenAI Python/Node.js SDKs by simply changing the base_url to api.voltagegpu.com.

What models are available on VoltageGPU?

VoltageGPU offers 140+ models including Qwen3, DeepSeek, Mistral, Llama, Gemma, FLUX for images, and many more. New models are added regularly.

How do I get started with VoltageGPU?

Sign up for a free account, generate an API key from your dashboard, and start making API calls. No credit card required to start. Pay only for what you use.

Loading AI models catalog...