Text EmbeddingsBAAIOpen SourceMultilingualEfficient

BGE-M3 API

Multilingual embedding model supporting 100+ languages with dense, sparse, and multi-vector outputs.

Parameters

568M

Context

8,192 tokens

Organization

BAAI

Pricing

$0.02

per 1M tokens

Try BGE-M3 for Free

Quick Start

Start using BGE-M3 in minutes. VoltageGPU provides an OpenAI-compatible API — just change the base_url.

Python (OpenAI SDK)
pip install openai
from openai import OpenAI

client = OpenAI(
    base_url="https://api.voltagegpu.com/v1",
    api_key="YOUR_VOLTAGE_API_KEY"
)

# Generate embeddings
response = client.embeddings.create(
    model="BAAI/bge-m3",
    input=[
        "How do I deploy a machine learning model?",
        "Steps to put an ML model into production",
        "Best pizza recipe with mozzarella"
    ]
)

# Access embeddings
for i, embedding in enumerate(response.data):
    print(f"Text {i}: {len(embedding.embedding)} dimensions")

# Calculate cosine similarity
import numpy as np
v1 = np.array(response.data[0].embedding)
v2 = np.array(response.data[1].embedding)
similarity = np.dot(v1, v2) / (np.linalg.norm(v1) * np.linalg.norm(v2))
print(f"Similarity between text 0 and 1: {similarity:.4f}")
cURL
Terminal
curl -X POST https://api.voltagegpu.com/v1/embeddings \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer YOUR_VOLTAGE_API_KEY" \
  -d '{
    "model": "BAAI/bge-m3",
    "input": [
      "How do I deploy a machine learning model?",
      "Steps to put an ML model into production"
    ]
  }'

Pricing

ComponentPriceUnit
Tokens$0.02per 1M tokens

New accounts receive $5 free credit. No credit card required to start.


Capabilities & Benchmarks

BGE-M3 generates 1024-dimensional dense embeddings optimized for semantic similarity and retrieval. It achieves state-of-the-art results on MTEB (Massive Text Embedding Benchmark) across multiple languages. The model supports three retrieval modes: dense retrieval (cosine similarity), sparse retrieval (lexical matching like BM25), and multi-vector retrieval (ColBERT-style fine-grained matching). It handles 100+ languages and processes inputs up to 8,192 tokens.


About BGE-M3

BGE-M3 (BAAI General Embedding - Multi-Functionality, Multi-Linguality, Multi-Granularity) is a state-of-the-art text embedding model developed by the Beijing Academy of Artificial Intelligence. It supports 100+ languages and generates dense, sparse, and multi-vector embeddings simultaneously. BGE-M3 excels at semantic search, information retrieval, clustering, and classification tasks. With support for up to 8,192 tokens of input, it can embed entire documents for comprehensive semantic representation.


Use Cases

🔍

Semantic Search

Build search engines that understand meaning, not just keywords, across 100+ languages.

📚

RAG (Retrieval-Augmented Generation)

Create knowledge bases for LLM grounding with accurate document retrieval.

📁

Document Clustering

Automatically organize and categorize documents by semantic similarity.

🎯

Recommendation Systems

Build content recommendation engines based on semantic similarity between items.

🔄

Duplicate Detection

Identify duplicate or near-duplicate content across large document collections.


API Reference

Endpoint

POSThttps://api.voltagegpu.com/v1/embeddings

Headers

AuthorizationBearer YOUR_VOLTAGE_API_KEYRequired
Content-Typeapplication/jsonRequired

Model ID

BAAI/bge-m3

Use this value as the model parameter in your API requests.

Example Request

curl -X POST https://api.voltagegpu.com/v1/embeddings \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer YOUR_VOLTAGE_API_KEY" \
  -d '{
    "model": "BAAI/bge-m3",
    "input": [
      "How do I deploy a machine learning model?",
      "Steps to put an ML model into production"
    ]
  }'



Frequently Asked Questions

What is the embedding dimension of BGE-M3?

BGE-M3 produces 1024-dimensional dense embeddings. These can be used directly for cosine similarity search in vector databases like Pinecone, Weaviate, Milvus, or Qdrant.

How does BGE-M3 compare to OpenAI embeddings?

BGE-M3 achieves competitive or superior performance to OpenAI text-embedding-3-large on many MTEB benchmarks while being significantly cheaper ($0.02/M tokens vs $0.13/M tokens). It also supports 100+ languages compared to OpenAI's more limited multilingual support.

What are dense, sparse, and multi-vector embeddings?

Dense embeddings are fixed-size vectors capturing semantic meaning. Sparse embeddings are high-dimensional vectors with mostly zeros, similar to BM25, capturing lexical matches. Multi-vector embeddings generate one vector per token for fine-grained matching (ColBERT-style). BGE-M3 can generate all three simultaneously.

What vector databases work with BGE-M3?

BGE-M3's 1024-dimensional embeddings are compatible with all major vector databases: Pinecone, Weaviate, Milvus, Qdrant, Chroma, pgvector, and any database supporting cosine similarity search.

How much text can BGE-M3 embed at once?

BGE-M3 supports inputs up to 8,192 tokens, approximately 6,000 words. This is enough to embed entire articles, long paragraphs, or multiple short documents in a single request.


Start using BGE-M3 today

Get $5 free credit when you sign up. No credit card required. Deploy in under 30 seconds with our OpenAI-compatible API.