Speech to Text API - Whisper & More
Transcribe audio and video with Whisper Large V3 on VoltageGPU. OpenAI-compatible API. Up to 10x cheaper than alternatives.
VoltageGPU runs OpenAI Whisper Large V3 and other speech-to-text models on GPU-accelerated infrastructure for fast, accurate transcription. Process hours of audio in minutes, support 99+ languages, and get word-level timestamps. Our OpenAI-compatible API makes migration effortless, and our pricing is up to 10x lower than hosted alternatives.
Key Benefits
Whisper Large V3
The most accurate open-source speech recognition model. 99%+ accuracy on clean audio in English.
99+ Languages
Transcribe audio in over 99 languages with automatic language detection. No model switching needed.
OpenAI-Compatible
Use the same OpenAI SDK and API format. Migrate from OpenAI Whisper API by changing one URL.
Word-Level Timestamps
Get precise word-level timestamps for subtitle generation, content navigation, and searchable audio.
10x Cheaper
Whisper API on VoltageGPU costs ~$0.003/min vs $0.006/min on OpenAI. Even cheaper for bulk processing.
Batch Processing
Transcribe hundreds of hours of audio in parallel. Ideal for podcast archives, call centers, and media companies.
Recommended GPUs
Code Example
from openai import OpenAI
# Initialize VoltageGPU client
client = OpenAI(
base_url="https://api.voltagegpu.com/v1",
api_key="YOUR_VOLTAGE_API_KEY"
)
# Transcribe an audio file
with open("interview.mp3", "rb") as audio_file:
transcript = client.audio.transcriptions.create(
model="whisper-large-v3",
file=audio_file,
response_format="verbose_json",
timestamp_granularities=["word"],
)
print(f"Transcription: {transcript.text}")
print(f"Language: {transcript.language}")
print(f"Duration: {transcript.duration}s")
# Access word-level timestamps
for word in transcript.words:
print(f" [{word.start:.2f}s - {word.end:.2f}s] {word.word}")
# Translation (any language to English)
with open("french_podcast.mp3", "rb") as audio_file:
translation = client.audio.translations.create(
model="whisper-large-v3",
file=audio_file,
)
print(f"English translation: {translation.text}")Frequently Asked Questions
How accurate is Whisper Large V3 on VoltageGPU?
How does pricing compare to OpenAI Whisper API?
What audio formats are supported?
Can I get subtitles in SRT or VTT format?
Is real-time transcription supported?
Explore Other Use Cases
Start Building Now
Deploy a GPU pod in under 60 seconds. $5 free credits, no credit card required.
Browse Available GPUs →Explore Models