GPU & AI Glossary

Essential terms and definitions for GPU compute and AI infrastructure

Quick Navigation

AI/MLBusinessDevelopmentHardwareInfrastructureNetworkingOptimizationPerformanceTechnical

AI/ML

Checkpoint

Saved state of a model during training, allowing resume or rollback.

Epoch

One complete pass through the entire training dataset.

Fine-tuning

Adapting a pre-trained model to specific tasks or domains.

Inference

The process of using a trained AI model to make predictions on new data.

LLM

Large Language Model - AI models trained on vast amounts of text data for natural language understanding and generation.

Tensor

Multi-dimensional array used in deep learning computations.

Token

Basic unit of text processed by language models, roughly equivalent to 4 characters.

Training

The process of teaching an AI model by feeding it data and adjusting its parameters.

Business

Markup

Percentage added to base cost to determine final pricing.

SLA

Service Level Agreement - Guaranteed performance and availability metrics.

Development

API

Application Programming Interface - A set of protocols for building and integrating application software.

Hot Reload

Updating code or models without restarting the service.

Jupyter

Interactive computing environment for data science and machine learning.

OpenAI Compatible

APIs that follow OpenAI's interface standards, allowing easy migration between providers.

Hardware

GPU

Graphics Processing Unit - Specialized hardware designed for parallel processing, essential for AI training and inference.

VRAM

Video Random Access Memory - Dedicated memory on GPUs used to store model weights and intermediate computations.

Infrastructure

Decentralized

Distributed system without single point of control or failure.

Docker

Platform for developing, shipping, and running applications in containers.

Kubernetes

Container orchestration platform for automating deployment and scaling.

Pod

A containerized GPU instance that provides isolated compute resources.

SSH

Secure Shell - Protocol for secure remote access to computing resources.

Networking

Egress

Data transfer out of a cloud service, often subject to additional fees.

Optimization

Quantization

Reducing model precision to decrease memory usage and increase speed.

Performance

Batch Processing

Processing multiple requests together to improve efficiency.

Cold Start

Initial delay when starting a service or loading a model for the first time.

Latency

The time delay between sending a request and receiving a response.

P95 Latency

95th percentile response time - 95% of requests complete faster than this.

Throughput

The amount of data processed per unit of time.

Technical

CUDA

NVIDIA's parallel computing platform and programming model for GPUs.

FP16

Half-precision floating-point format using 16 bits, common in AI workloads.

Can't find what you're looking for?

This glossary covers the most common terms in GPU computing and AI infrastructure.

API DocumentationGetting Started Guide

Related Resources

Pricing

Current GPU rates and availability

Benchmarks

Performance comparisons and metrics

Comparisons

VoltageGPU vs other providers

AI Models

Available models and capabilities