Alibaba's flagship 72B model excelling at multilingual tasks, coding, and mathematics.
Parameters
72B
Context
131,072 tokens
Organization
Alibaba Cloud
Start using Qwen 2.5 72B in minutes. VoltageGPU provides an OpenAI-compatible API — just change the base_url.
from openai import OpenAI
client = OpenAI(
base_url="https://api.voltagegpu.com/v1",
api_key="YOUR_VOLTAGE_API_KEY"
)
response = client.chat.completions.create(
model="Qwen/Qwen2.5-72B-Instruct",
messages=[
{"role": "system", "content": "You are a multilingual assistant. Respond in the same language as the user."},
{"role": "user", "content": "Explain quantum computing in simple terms."}
],
max_tokens=2048,
temperature=0.7
)
print(response.choices[0].message.content)curl -X POST https://api.voltagegpu.com/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer YOUR_VOLTAGE_API_KEY" \
-d '{
"model": "Qwen/Qwen2.5-72B-Instruct",
"messages": [
{"role": "system", "content": "You are a multilingual assistant."},
{"role": "user", "content": "Explain quantum computing in simple terms."}
],
"max_tokens": 2048,
"temperature": 0.7
}'| Component | Price | Unit |
|---|---|---|
| Input tokens | $0.4 | per 1M tokens |
| Output tokens | $0.4 | per 1M tokens |
New accounts receive $5 free credit. No credit card required to start.
Qwen 2.5 72B achieves excellent benchmark scores: MMLU (86.1%), HumanEval (86.6%), MATH (83.1%), and GSM8K (91.6%). It supports 29+ languages, structured output (JSON/XML), tool use, and function calling. The model excels at bilingual English-Chinese tasks and offers strong performance in code generation, mathematical reasoning, and long-context processing up to 131K tokens.
Qwen 2.5 72B is Alibaba Cloud's flagship open-weight language model, delivering exceptional performance across English, Chinese, and 27+ additional languages. With 72 billion parameters and a 131K context window, it achieves top-tier results on coding, mathematics, and general knowledge benchmarks. Qwen 2.5 features improved instruction following, structured output generation, and long-context understanding compared to its predecessors. It was trained on 18 trillion tokens of high-quality multilingual data.
Build applications serving users in 29+ languages with strong bilingual English-Chinese capabilities.
Generate high-quality code with top-tier HumanEval scores across multiple programming languages.
Solve complex math problems with step-by-step reasoning and high accuracy.
Extract and generate structured JSON/XML output from unstructured text reliably.
Analyze documents up to 131K tokens for summarization, Q&A, and insight extraction.
https://api.voltagegpu.com/v1/chat/completions| Authorization | Bearer YOUR_VOLTAGE_API_KEY | Required |
| Content-Type | application/json | Required |
Qwen/Qwen2.5-72B-InstructUse this value as the model parameter in your API requests.
curl -X POST https://api.voltagegpu.com/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer YOUR_VOLTAGE_API_KEY" \
-d '{
"model": "Qwen/Qwen2.5-72B-Instruct",
"messages": [
{"role": "system", "content": "You are a multilingual assistant."},
{"role": "user", "content": "Explain quantum computing in simple terms."}
],
"max_tokens": 2048,
"temperature": 0.7
}'Great price-performance for smaller models with 24GB VRAM.
Enterprise-grade GPU for production inference at scale.
Access this model and 140+ others through our OpenAI-compatible API.
Compare GPU cloud pricing and model hosting features.
View GPU compute and AI inference pricing with no hidden fees.
Deploy a GPU pod in under 60 seconds to run models locally.
Qwen 2.5 72B supports 29+ languages including English, Chinese (Simplified & Traditional), Japanese, Korean, French, German, Spanish, Portuguese, Arabic, Russian, Thai, Vietnamese, Indonesian, and many more. It is particularly strong in English-Chinese bilingual tasks.
Qwen 2.5 72B generally matches or exceeds Llama 3.3 70B on most benchmarks. It scores higher on coding (HumanEval: 86.6% vs 88.4%) and math (MATH: 83.1% vs 77.0%). It also supports more languages and offers better Chinese language capabilities. At $0.40/M tokens, it offers competitive pricing.
Yes, Qwen 2.5 72B excels at generating structured output in JSON, XML, and other formats. You can use the response_format parameter to request JSON mode through the VoltageGPU API.
Qwen 2.5 72B supports a 131,072 token context window, allowing it to process very long documents, codebases, and conversation histories in a single request.
Get $5 free credit when you sign up. No credit card required. Deploy in under 30 seconds with our OpenAI-compatible API.