Pricing

Simple, Transparent Pricing

Choose the plan that fits your inference workload. Upgrade or downgrade any time. No lock-in, no hidden fees.

Starter

For teams deploying their first production AI model and getting familiar with inference infrastructure.

$199/mo

1M requests per month
Up to 3 deployed models
Basic monitoring dashboard
Email support (48h response)
99.9% uptime SLA
REST API access
Standard quantization (INT8)
Community documentation

Get Started

Scale

For high-growth AI teams running serious inference workloads who need reliability and enterprise-grade features.

$599/mo

50M requests per month
Unlimited deployed models
Auto-scaling (GPU-aware)
Priority support (4h response)
Full API + webhooks
99.99% uptime SLA
Advanced quantization (INT4/FP8)
Prometheus / Grafana integration

Get Started

Enterprise

For organizations requiring dedicated infrastructure, compliance, and custom deployment architecture.

Custom

Unlimited requests
Dedicated GPU infrastructure
Custom SLA (up to 99.999%)
On-premise deployment option
24/7 dedicated support
Single Sign-On (SSO/SAML)
SOC 2 Type II & HIPAA
Private VPC + audit logging

Contact Sales

All Plans Include

OpenAI-compatible API

Python & TypeScript SDKs

End-to-end TLS encryption

HuggingFace model import

Usage analytics dashboard

No per-token billing surprise

FAQ

Common Questions

Can I change plans at any time?

Yes. Upgrades are instant. Downgrades take effect at the end of your current billing cycle. No penalties or lock-in.

How is the annual plan billed?

Annual plans are billed upfront for 12 months. The Scale plan is $5,990/year — equivalent to 10 months at the monthly rate, giving you 2 months free.

What happens if I exceed my monthly request limit?

We will notify you at 80% usage. Overage requests are billed at a per-million rate until your next cycle. Enterprise plans have no hard limits.

Is there a free trial?

We offer a 14-day evaluation for Enterprise customers. For self-serve plans, the Starter tier is designed to be accessible for early-stage teams. Contact us to discuss your needs.

Not Sure Which Plan Fits?

Talk to our team. We will help you identify the right starting point based on your model count, request volume, and latency requirements.

Contact Sales Explore the Platform