Pricing

Simple, Transparent Pricing

Choose the plan that fits your inference workload. Upgrade or downgrade any time. No lock-in, no hidden fees.

Starter

For teams deploying their first production AI model and getting familiar with inference infrastructure.

$199/mo
  • 1M requests per month
  • Up to 3 deployed models
  • Basic monitoring dashboard
  • Email support (48h response)
  • 99.9% uptime SLA
  • REST API access
  • Standard quantization (INT8)
  • Community documentation
Get Started

Enterprise

For organizations requiring dedicated infrastructure, compliance, and custom deployment architecture.

Custom
  • Unlimited requests
  • Dedicated GPU infrastructure
  • Custom SLA (up to 99.999%)
  • On-premise deployment option
  • 24/7 dedicated support
  • Single Sign-On (SSO/SAML)
  • SOC 2 Type II & HIPAA
  • Private VPC + audit logging
Contact Sales

All Plans Include

OpenAI-compatible API
Python & TypeScript SDKs
End-to-end TLS encryption
HuggingFace model import
Usage analytics dashboard
No per-token billing surprise
FAQ

Common Questions

Can I change plans at any time?

Yes. Upgrades are instant. Downgrades take effect at the end of your current billing cycle. No penalties or lock-in.

How is the annual plan billed?

Annual plans are billed upfront for 12 months. The Scale plan is $5,990/year — equivalent to 10 months at the monthly rate, giving you 2 months free.

What happens if I exceed my monthly request limit?

We will notify you at 80% usage. Overage requests are billed at a per-million rate until your next cycle. Enterprise plans have no hard limits.

Is there a free trial?

We offer a 14-day evaluation for Enterprise customers. For self-serve plans, the Starter tier is designed to be accessible for early-stage teams. Contact us to discuss your needs.

Not Sure Which Plan Fits?

Talk to our team. We will help you identify the right starting point based on your model count, request volume, and latency requirements.