BGE-M3 API
Multilingual embedding model supporting 100+ languages with dense, sparse, and multi-vector outputs.
Parameters
568M
Context
8,192 tokens
Organization
BAAI
Quick Start
Start using BGE-M3 in minutes. VoltageGPU provides an OpenAI-compatible API — just change the base_url.
from openai import OpenAI
client = OpenAI(
base_url="https://api.voltagegpu.com/v1",
api_key="YOUR_VOLTAGE_API_KEY"
)
# Generate embeddings
response = client.embeddings.create(
model="BAAI/bge-m3",
input=[
"How do I deploy a machine learning model?",
"Steps to put an ML model into production",
"Best pizza recipe with mozzarella"
]
)
# Access embeddings
for i, embedding in enumerate(response.data):
print(f"Text {i}: {len(embedding.embedding)} dimensions")
# Calculate cosine similarity
import numpy as np
v1 = np.array(response.data[0].embedding)
v2 = np.array(response.data[1].embedding)
similarity = np.dot(v1, v2) / (np.linalg.norm(v1) * np.linalg.norm(v2))
print(f"Similarity between text 0 and 1: {similarity:.4f}")curl -X POST https://api.voltagegpu.com/v1/embeddings \
-H "Content-Type: application/json" \
-H "Authorization: Bearer YOUR_VOLTAGE_API_KEY" \
-d '{
"model": "BAAI/bge-m3",
"input": [
"How do I deploy a machine learning model?",
"Steps to put an ML model into production"
]
}'Pricing
| Component | Price | Unit |
|---|---|---|
| Tokens | $0.02 | per 1M tokens |
New accounts receive $5 free credit. No credit card required to start.
Capabilities & Benchmarks
BGE-M3 generates 1024-dimensional dense embeddings optimized for semantic similarity and retrieval. It achieves state-of-the-art results on MTEB (Massive Text Embedding Benchmark) across multiple languages. The model supports three retrieval modes: dense retrieval (cosine similarity), sparse retrieval (lexical matching like BM25), and multi-vector retrieval (ColBERT-style fine-grained matching). It handles 100+ languages and processes inputs up to 8,192 tokens.
About BGE-M3
BGE-M3 (BAAI General Embedding - Multi-Functionality, Multi-Linguality, Multi-Granularity) is a state-of-the-art text embedding model developed by the Beijing Academy of Artificial Intelligence. It supports 100+ languages and generates dense, sparse, and multi-vector embeddings simultaneously. BGE-M3 excels at semantic search, information retrieval, clustering, and classification tasks. With support for up to 8,192 tokens of input, it can embed entire documents for comprehensive semantic representation.
Use Cases
Semantic Search
Build search engines that understand meaning, not just keywords, across 100+ languages.
RAG (Retrieval-Augmented Generation)
Create knowledge bases for LLM grounding with accurate document retrieval.
Document Clustering
Automatically organize and categorize documents by semantic similarity.
Recommendation Systems
Build content recommendation engines based on semantic similarity between items.
Duplicate Detection
Identify duplicate or near-duplicate content across large document collections.
API Reference
Endpoint
https://api.voltagegpu.com/v1/embeddingsHeaders
| Authorization | Bearer YOUR_VOLTAGE_API_KEY | Required |
| Content-Type | application/json | Required |
Model ID
BAAI/bge-m3Use this value as the model parameter in your API requests.
Example Request
curl -X POST https://api.voltagegpu.com/v1/embeddings \
-H "Content-Type: application/json" \
-H "Authorization: Bearer YOUR_VOLTAGE_API_KEY" \
-d '{
"model": "BAAI/bge-m3",
"input": [
"How do I deploy a machine learning model?",
"Steps to put an ML model into production"
]
}'Related Models
BGE-M3 — Related Resources
Confidential Compute
Run this model on hardware-sealed GPUs with Intel TDX attestation.
Confidential AI Inference
OpenAI-compatible API with TEE-attested model serving.
Pricing
Confidential Compute and AI Inference pricing with no hidden fees.
Browse Confidential GPUs
H200, H100, B200 with hardware-sealed Intel TDX compute.
Frequently Asked Questions
What is the embedding dimension of BGE-M3?
BGE-M3 produces 1024-dimensional dense embeddings. These can be used directly for cosine similarity search in vector databases like Pinecone, Weaviate, Milvus, or Qdrant.
How does BGE-M3 compare to OpenAI embeddings?
BGE-M3 achieves competitive or superior performance to OpenAI text-embedding-3-large on many MTEB benchmarks while being significantly cheaper ($0.02/M tokens vs $0.13/M tokens). It also supports 100+ languages compared to OpenAI's more limited multilingual support.
What are dense, sparse, and multi-vector embeddings?
Dense embeddings are fixed-size vectors capturing semantic meaning. Sparse embeddings are high-dimensional vectors with mostly zeros, similar to BM25, capturing lexical matches. Multi-vector embeddings generate one vector per token for fine-grained matching (ColBERT-style). BGE-M3 can generate all three simultaneously.
What vector databases work with BGE-M3?
BGE-M3's 1024-dimensional embeddings are compatible with all major vector databases: Pinecone, Weaviate, Milvus, Qdrant, Chroma, pgvector, and any database supporting cosine similarity search.
How much text can BGE-M3 embed at once?
BGE-M3 supports inputs up to 8,192 tokens, approximately 6,000 words. This is enough to embed entire articles, long paragraphs, or multiple short documents in a single request.
Start using BGE-M3 today
Get $5 free credit when you sign up. No credit card required. Deploy in under 30 seconds with our OpenAI-compatible API.