🔥 Qwen/Qwen3-32B
High-performance 32B parameter LLM. Excellent for reasoning, coding, and multilingual tasks.
33.54M runs in 7 days
Alibaba's flagship 72B model excelling at multilingual tasks, coding, and mathematics.
Parameters
72B
Context
131,072 tokens
Organization
Alibaba Cloud
Start using Qwen 2.5 72B in minutes. VoltageGPU provides an OpenAI-compatible API — just change the base_url.
from openai import OpenAI
client = OpenAI(
base_url="https://api.voltagegpu.com/v1",
api_key="YOUR_VOLTAGE_API_KEY"
)
response = client.chat.completions.create(
model="Qwen/Qwen2.5-72B-Instruct",
messages=[
{"role": "system", "content": "You are a multilingual assistant. Respond in the same language as the user."},
{"role": "user", "content": "Explain quantum computing in simple terms."}
],
max_tokens=2048,
temperature=0.7
)
print(response.choices[0].message.content)curl -X POST https://api.voltagegpu.com/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer YOUR_VOLTAGE_API_KEY" \
-d '{
"model": "Qwen/Qwen2.5-72B-Instruct",
"messages": [
{"role": "system", "content": "You are a multilingual assistant."},
{"role": "user", "content": "Explain quantum computing in simple terms."}
],
"max_tokens": 2048,
"temperature": 0.7
}'| Component | Price | Unit |
|---|---|---|
| Input tokens | $0.4 | per 1M tokens |
| Output tokens | $0.4 | per 1M tokens |
New accounts receive $5 free credit. No credit card required to start.
Qwen 2.5 72B achieves excellent benchmark scores: MMLU (86.1%), HumanEval (86.6%), MATH (83.1%), and GSM8K (91.6%). It supports 29+ languages, structured output (JSON/XML), tool use, and function calling. The model excels at bilingual English-Chinese tasks and offers strong performance in code generation, mathematical reasoning, and long-context processing up to 131K tokens.
Qwen 2.5 72B is Alibaba Cloud's flagship open-weight language model, delivering exceptional performance across English, Chinese, and 27+ additional languages. With 72 billion parameters and a 131K context window, it achieves top-tier results on coding, mathematics, and general knowledge benchmarks. Qwen 2.5 features improved instruction following, structured output generation, and long-context understanding compared to its predecessors. It was trained on 18 trillion tokens of high-quality multilingual data.
Build applications serving users in 29+ languages with strong bilingual English-Chinese capabilities.
Generate high-quality code with top-tier HumanEval scores across multiple programming languages.
Solve complex math problems with step-by-step reasoning and high accuracy.
Extract and generate structured JSON/XML output from unstructured text reliably.
Analyze documents up to 131K tokens for summarization, Q&A, and insight extraction.
https://api.voltagegpu.com/v1/chat/completions| Authorization | Bearer YOUR_VOLTAGE_API_KEY | Required |
| Content-Type | application/json | Required |
Qwen/Qwen2.5-72B-InstructUse this value as the model parameter in your API requests.
curl -X POST https://api.voltagegpu.com/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer YOUR_VOLTAGE_API_KEY" \
-d '{
"model": "Qwen/Qwen2.5-72B-Instruct",
"messages": [
{"role": "system", "content": "You are a multilingual assistant."},
{"role": "user", "content": "Explain quantum computing in simple terms."}
],
"max_tokens": 2048,
"temperature": 0.7
}'Great price-performance for smaller models with 24GB VRAM.
Enterprise-grade GPU for production inference at scale.
Access this model and 140+ others through our OpenAI-compatible API.
Compare GPU cloud pricing and model hosting features.
View GPU compute and AI inference pricing with no hidden fees.
Deploy a GPU pod in under 60 seconds to run models locally.
Qwen 2.5 72B supports 29+ languages including English, Chinese (Simplified & Traditional), Japanese, Korean, French, German, Spanish, Portuguese, Arabic, Russian, Thai, Vietnamese, Indonesian, and many more. It is particularly strong in English-Chinese bilingual tasks.
Qwen 2.5 72B generally matches or exceeds Llama 3.3 70B on most benchmarks. It scores higher on coding (HumanEval: 86.6% vs 88.4%) and math (MATH: 83.1% vs 77.0%). It also supports more languages and offers better Chinese language capabilities. At $0.40/M tokens, it offers competitive pricing.
Yes, Qwen 2.5 72B excels at generating structured output in JSON, XML, and other formats. You can use the response_format parameter to request JSON mode through the VoltageGPU API.
Qwen 2.5 72B supports a 131,072 token context window, allowing it to process very long documents, codebases, and conversation histories in a single request.
Get $5 free credit when you sign up. No credit card required. Deploy in under 30 seconds with our OpenAI-compatible API.