Qwen 2.5 72B API
Alibaba's flagship 72B model excelling at multilingual tasks, coding, and mathematics.
Parameters
72B
Context
131,072 tokens
Organization
Alibaba Cloud
Quick Start
Start using Qwen 2.5 72B in minutes. VoltageGPU provides an OpenAI-compatible API — just change the base_url.
from openai import OpenAI
client = OpenAI(
base_url="https://api.voltagegpu.com/v1",
api_key="YOUR_VOLTAGE_API_KEY"
)
response = client.chat.completions.create(
model="Qwen/Qwen2.5-72B-Instruct",
messages=[
{"role": "system", "content": "You are a multilingual assistant. Respond in the same language as the user."},
{"role": "user", "content": "Explain quantum computing in simple terms."}
],
max_tokens=2048,
temperature=0.7
)
print(response.choices[0].message.content)curl -X POST https://api.voltagegpu.com/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer YOUR_VOLTAGE_API_KEY" \
-d '{
"model": "Qwen/Qwen2.5-72B-Instruct",
"messages": [
{"role": "system", "content": "You are a multilingual assistant."},
{"role": "user", "content": "Explain quantum computing in simple terms."}
],
"max_tokens": 2048,
"temperature": 0.7
}'Pricing
| Component | Price | Unit |
|---|---|---|
| Input tokens | $0.4 | per 1M tokens |
| Output tokens | $0.4 | per 1M tokens |
New accounts receive $5 free credit. No credit card required to start.
Capabilities & Benchmarks
Qwen 2.5 72B achieves excellent benchmark scores: MMLU (86.1%), HumanEval (86.6%), MATH (83.1%), and GSM8K (91.6%). It supports 29+ languages, structured output (JSON/XML), tool use, and function calling. The model excels at bilingual English-Chinese tasks and offers strong performance in code generation, mathematical reasoning, and long-context processing up to 131K tokens.
About Qwen 2.5 72B
Qwen 2.5 72B is Alibaba Cloud's flagship open-weight language model, delivering exceptional performance across English, Chinese, and 27+ additional languages. With 72 billion parameters and a 131K context window, it achieves top-tier results on coding, mathematics, and general knowledge benchmarks. Qwen 2.5 features improved instruction following, structured output generation, and long-context understanding compared to its predecessors. It was trained on 18 trillion tokens of high-quality multilingual data.
Use Cases
Multilingual Applications
Build applications serving users in 29+ languages with strong bilingual English-Chinese capabilities.
Code Generation
Generate high-quality code with top-tier HumanEval scores across multiple programming languages.
Mathematical Reasoning
Solve complex math problems with step-by-step reasoning and high accuracy.
Structured Data Extraction
Extract and generate structured JSON/XML output from unstructured text reliably.
Long Document Analysis
Analyze documents up to 131K tokens for summarization, Q&A, and insight extraction.
API Reference
Endpoint
https://api.voltagegpu.com/v1/chat/completionsHeaders
| Authorization | Bearer YOUR_VOLTAGE_API_KEY | Required |
| Content-Type | application/json | Required |
Model ID
Qwen/Qwen2.5-72B-InstructUse this value as the model parameter in your API requests.
Example Request
curl -X POST https://api.voltagegpu.com/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer YOUR_VOLTAGE_API_KEY" \
-d '{
"model": "Qwen/Qwen2.5-72B-Instruct",
"messages": [
{"role": "system", "content": "You are a multilingual assistant."},
{"role": "user", "content": "Explain quantum computing in simple terms."}
],
"max_tokens": 2048,
"temperature": 0.7
}'Related Models
Qwen 2.5 72B — Related Resources
Confidential Compute
Run this model on hardware-sealed GPUs with Intel TDX attestation.
Confidential AI Inference
OpenAI-compatible API with TEE-attested model serving.
Pricing
Confidential Compute and AI Inference pricing with no hidden fees.
Browse Confidential GPUs
H200, H100, B200 with hardware-sealed Intel TDX compute.
Frequently Asked Questions
What languages does Qwen 2.5 72B support?
Qwen 2.5 72B supports 29+ languages including English, Chinese (Simplified & Traditional), Japanese, Korean, French, German, Spanish, Portuguese, Arabic, Russian, Thai, Vietnamese, Indonesian, and many more. It is particularly strong in English-Chinese bilingual tasks.
How does Qwen 2.5 72B compare to Llama 3 70B?
Qwen 2.5 72B generally matches or exceeds Llama 3.3 70B on most benchmarks. It scores higher on coding (HumanEval: 86.6% vs 88.4%) and math (MATH: 83.1% vs 77.0%). It also supports more languages and offers better Chinese language capabilities. At $0.40/M tokens, it offers competitive pricing.
Does Qwen 2.5 72B support structured output?
Yes, Qwen 2.5 72B excels at generating structured output in JSON, XML, and other formats. You can use the response_format parameter to request JSON mode through the VoltageGPU API.
What is the context window of Qwen 2.5 72B?
Qwen 2.5 72B supports a 131,072 token context window, allowing it to process very long documents, codebases, and conversation histories in a single request.
Start using Qwen 2.5 72B today
Get $5 free credit when you sign up. No credit card required. Deploy in under 30 seconds with our OpenAI-compatible API.