🔥 Qwen/Qwen3-32B
High-performance 32B parameter LLM. Excellent for reasoning, coding, and multilingual tasks.
33.54M runs in 7 days
Multilingual embedding model supporting 100+ languages with dense, sparse, and multi-vector outputs.
Parameters
568M
Context
8,192 tokens
Organization
BAAI
Start using BGE-M3 in minutes. VoltageGPU provides an OpenAI-compatible API — just change the base_url.
from openai import OpenAI
client = OpenAI(
base_url="https://api.voltagegpu.com/v1",
api_key="YOUR_VOLTAGE_API_KEY"
)
# Generate embeddings
response = client.embeddings.create(
model="BAAI/bge-m3",
input=[
"How do I deploy a machine learning model?",
"Steps to put an ML model into production",
"Best pizza recipe with mozzarella"
]
)
# Access embeddings
for i, embedding in enumerate(response.data):
print(f"Text {i}: {len(embedding.embedding)} dimensions")
# Calculate cosine similarity
import numpy as np
v1 = np.array(response.data[0].embedding)
v2 = np.array(response.data[1].embedding)
similarity = np.dot(v1, v2) / (np.linalg.norm(v1) * np.linalg.norm(v2))
print(f"Similarity between text 0 and 1: {similarity:.4f}")curl -X POST https://api.voltagegpu.com/v1/embeddings \
-H "Content-Type: application/json" \
-H "Authorization: Bearer YOUR_VOLTAGE_API_KEY" \
-d '{
"model": "BAAI/bge-m3",
"input": [
"How do I deploy a machine learning model?",
"Steps to put an ML model into production"
]
}'| Component | Price | Unit |
|---|---|---|
| Tokens | $0.02 | per 1M tokens |
New accounts receive $5 free credit. No credit card required to start.
BGE-M3 generates 1024-dimensional dense embeddings optimized for semantic similarity and retrieval. It achieves state-of-the-art results on MTEB (Massive Text Embedding Benchmark) across multiple languages. The model supports three retrieval modes: dense retrieval (cosine similarity), sparse retrieval (lexical matching like BM25), and multi-vector retrieval (ColBERT-style fine-grained matching). It handles 100+ languages and processes inputs up to 8,192 tokens.
BGE-M3 (BAAI General Embedding - Multi-Functionality, Multi-Linguality, Multi-Granularity) is a state-of-the-art text embedding model developed by the Beijing Academy of Artificial Intelligence. It supports 100+ languages and generates dense, sparse, and multi-vector embeddings simultaneously. BGE-M3 excels at semantic search, information retrieval, clustering, and classification tasks. With support for up to 8,192 tokens of input, it can embed entire documents for comprehensive semantic representation.
Build search engines that understand meaning, not just keywords, across 100+ languages.
Create knowledge bases for LLM grounding with accurate document retrieval.
Automatically organize and categorize documents by semantic similarity.
Build content recommendation engines based on semantic similarity between items.
Identify duplicate or near-duplicate content across large document collections.
https://api.voltagegpu.com/v1/embeddings| Authorization | Bearer YOUR_VOLTAGE_API_KEY | Required |
| Content-Type | application/json | Required |
BAAI/bge-m3Use this value as the model parameter in your API requests.
curl -X POST https://api.voltagegpu.com/v1/embeddings \
-H "Content-Type: application/json" \
-H "Authorization: Bearer YOUR_VOLTAGE_API_KEY" \
-d '{
"model": "BAAI/bge-m3",
"input": [
"How do I deploy a machine learning model?",
"Steps to put an ML model into production"
]
}'Great price-performance for smaller models with 24GB VRAM.
Enterprise-grade GPU for production inference at scale.
Access this model and 140+ others through our OpenAI-compatible API.
Compare GPU cloud pricing and model hosting features.
View GPU compute and AI inference pricing with no hidden fees.
Deploy a GPU pod in under 60 seconds to run models locally.
BGE-M3 produces 1024-dimensional dense embeddings. These can be used directly for cosine similarity search in vector databases like Pinecone, Weaviate, Milvus, or Qdrant.
BGE-M3 achieves competitive or superior performance to OpenAI text-embedding-3-large on many MTEB benchmarks while being significantly cheaper ($0.02/M tokens vs $0.13/M tokens). It also supports 100+ languages compared to OpenAI's more limited multilingual support.
Dense embeddings are fixed-size vectors capturing semantic meaning. Sparse embeddings are high-dimensional vectors with mostly zeros, similar to BM25, capturing lexical matches. Multi-vector embeddings generate one vector per token for fine-grained matching (ColBERT-style). BGE-M3 can generate all three simultaneously.
BGE-M3's 1024-dimensional embeddings are compatible with all major vector databases: Pinecone, Weaviate, Milvus, Qdrant, Chroma, pgvector, and any database supporting cosine similarity search.
BGE-M3 supports inputs up to 8,192 tokens, approximately 6,000 words. This is enough to embed entire articles, long paragraphs, or multiple short documents in a single request.
Get $5 free credit when you sign up. No credit card required. Deploy in under 30 seconds with our OpenAI-compatible API.