If you've shipped an AI-powered product in the last year, you know the drill. You need GPT-5 for general chat, Claude for document analysis, an embedding model for your search pipeline, Stable Diffusion for image generation, and maybe Whisper for transcription. That's five providers, five API keys, five billing dashboards, five error formats, and five sets of documentation. Then a new model drops, and you do it all over again.
We built ArcaneAPI because this fragmentation is a real cost — not just in engineering hours, but in the decisions that never get made because switching providers means rewriting plumbing code.
The state of the market
The AI model ecosystem in early 2026 is both extraordinary and chaotic. There are now meaningful differences between dozens of models that actually matter for production workloads:
- OpenAI's GPT-5.2 offers a 400K context window with 90% prompt caching discounts, making it the go-to for tool-heavy agents.
- Anthropic's Claude Opus 4.6 pushes extended thinking and an experimental 1M-token context beta, aimed at deep compliance and analysis work.
- Google's Gemini 2.5 Pro has a 2M-token context window with native search grounding — ideal for long-document and real-time retrieval tasks.
- Meta's Llama 4 Maverick, a 400B MoE that activates only 17B parameters per forward pass, runs at 562 tokens per second on Groq's LPU hardware for $0.20 per million input tokens.
- DeepSeek V3.1 has disrupted the pricing floor entirely — 671B parameters, 37B active, at $0.60/$1.70 per million tokens.
And that's just text. Image generation has split between per-image (Stability AI at $0.065), per-megapixel (FLUX at $0.003 to $0.070), and tokenized (GPT Image 1.5 at $5/$10 per 1M) pricing models. Video costs anywhere from $0.05 per second (Runway Gen-4 Turbo) to $0.50 per second (Sora 2 Pro, 1080p). Embedding models now come with Matryoshka dimensions, task-specific LoRA adapters, and MoE architectures. The ecosystem is deep, specialized, and moving fast.
No individual developer or team should have to track all of this to build a product.
What ArcaneAPI actually does
ArcaneAPI is a unified API gateway. You integrate once, and we handle routing to 53 models across 19 providers. The interface is OpenAI-compatible, so if you've used the OpenAI SDK or REST API before, you already know how to use us. Switch between GPT-5.2, Claude Sonnet 4.6, Llama 4 Maverick, or any other model by changing the model parameter in your request. Everything else stays the same.
curl -X POST https://api.arcaneapi.com/v1/chat/completions \
-H "Authorization: Bearer ak_your_api_key" \
-H "Content-Type: application/json" \
-d '{
"model": "claude-sonnet-4-6",
"messages": [{"role": "user", "content": "Summarize this contract."}],
"max_tokens": 2000
}'
The same endpoint works for reasoning models like O3 and O4 Mini, code specialists like Codestral and Qwen 3 Coder, embedding models like Voyage 4 Large and Jina v5, image generators, video generators, and audio models. One key, one dashboard, one invoice.
Zero markup as a principle
We charge exactly what the upstream provider charges. Not a fraction of a cent more per token. This is a deliberate architectural decision, not a temporary promotional strategy.
The conventional API aggregator business model — buy tokens wholesale and mark them up — is structurally broken in 2026. Developers have full visibility into provider pricing. OpenRouter proved this by enforcing 0% token markup. Unify AI routes to the cheapest qualified provider automatically. If you add a per-token margin, high-volume users leave.
Our revenue comes from a transparent 5% processing fee on credit top-ups (which covers Stripe fees and infrastructure overhead), and eventually from premium enterprise features: latency-based dynamic failover, SOC 2 compliant data residency routing, hierarchical RBAC budget controls, and semantic caching. These are the features that production teams with compliance requirements will pay a flat monthly SaaS fee for — not per-token arbitrage.
What we support today
At launch, our model catalog spans seven categories:
- Chat (21 models): Commercial frontier models from OpenAI (GPT-5.2, GPT-5 Mini, GPT-5 Nano, GPT-4.1), Anthropic (Claude Opus 4.6, Sonnet 4.6, Haiku 4.5), Google (Gemini 3.1 Pro, 2.5 Pro, Flash Lite), Mistral (Large 3, Small 3), and xAI (Grok 4). Open-source chat from Meta (Llama 4 Maverick and Scout), Alibaba (Qwen 3 32B and 235B), DeepSeek (V3.1), OpenAI's open weights (GPT-OSS 120B and 20B), and Moonshot AI (Kimi K2).
- Reasoning (4 models): OpenAI O3, O3 Pro, O4 Mini, and DeepSeek R1.
- Code (4 models): Mistral Codestral and Devstral 2, Alibaba Qwen 3 Coder 480B, and xAI Grok Code Fast.
- Image generation (9 models): OpenAI GPT Image 1.5, Stability AI SD 3.5 family (Large, Medium, Flash), Black Forest Labs FLUX (2 Max, 1 Pro, 1 Schnell), Ideogram 3.0, and Kling 2.1 Image.
- Video generation (5 models): OpenAI Sora 2 and Sora 2 Pro, Runway Gen-4 Turbo, Kling 2.1 Pro, and Google Veo 3.0.
- Embeddings (6 models): Voyage AI (4 Large and Lite), Jina AI (v5), Cohere (Embed English v3), and OpenAI (text-embedding-3 Large and Small).
- Audio (4 models): ElevenLabs Multilingual v3, OpenAI Whisper Large v3 Turbo, OpenAI GPT Realtime, and Canopy Labs Orpheus v1.
All pricing is visible on our pricing page with the correct units — per 1M tokens for text models, per image or megapixel for image generators, per second for video, per 1K characters or per hour for audio. No surprises.
How the developer experience works
When you sign up, you get $10 in free credits and can generate an API key immediately. The key starts with ak_ and is shown once — we store only its SHA-256 hash. You authenticate every request with a standard Bearer token header.
The dashboard gives you real-time visibility into every API call: which model was used, how many tokens were processed, what it cost, and how fast the response was. You can filter by model, by date range, and by status. Usage alerts notify you when spending crosses thresholds, and low balance warnings give you time to top up before your keys stop working.
We built the entire platform on a straightforward stack — Node.js, Express, SQLite for speed and simplicity, and EJS templates with Tailwind CSS for the frontend. Infrastructure runs on AWS. Email (when we enable it) will be handled through Amazon SES with full CAN-SPAM compliance, bounce monitoring, and one-click unsubscribe.
What comes next
This launch is the foundation. Here is what we're working toward in Q2 2026:
- Streaming support: Server-sent events for all chat and reasoning models, with unified error handling across providers.
- Automatic failover: If a provider has an outage, requests are transparently rerouted to an equivalent model on another provider, based on latency and cost rules you configure.
- Semantic caching: Frequently repeated prompts (common in customer support and documentation bots) are served from cache at a fraction of the original cost.
- Bring Your Own Key (BYOK): Use your own provider API keys through our routing layer for the orchestration, observability, and fallback logic without us touching your billing.
- Team features: Shared workspaces, per-member budget limits, and usage attribution for organizations.
We are building ArcaneAPI for developers who want to use the best model for each task without maintaining a dozen integrations. If that sounds like you, create an account and start building. Your $10 in free credits are waiting.