AI Middleware-as-a-Service: The BYOK, Routing, Context-as-Data & Billing Convergence

Executive Summary

Every AI SaaS company reinvents the same infrastructure: key management, model selection, fallback chains, token metering, and usage-based billing. This report maps the landscape of existing point solutions, identifies the critical gap between "config file" tools and production products, and proposes an integrated AI Middleware-as-a-Service platform that combines BYOK gateway, intelligent model routing, context-as-data management, and AI-native payment/billing into a single product. The market opportunity is substantial: ~33,000 AI companies worldwide (Exploding Topics), an AI SaaS market projected at $142B in 2026 (Coherent Market Insights), and enterprise LLM API spending surging past $8.4B (Maxim AI). No existing product spans all four layers. The proposed platform could capture 2-5% take rate on payment flow plus per-request routing fees, targeting $5-15M ARR within 18 months.

Part 1: Market & Problem Definition

1. The AI API Orchestration Problem

Scale of the problem. As of October 2025, there are approximately 33,089 AI companies worldwide (Exploding Topics), with an estimated 200,000 SaaS companies globally (SEO.ai). Stanford's 2024 AI Index reports just over 10,000 AI startups across the top ten leading countries (Salesforce Ben). The AI SaaS market is projected at $142.02B in 2026 (Coherent Market Insights), growing at a 39.6% CAGR to $1,051B by 2033.

The reinvention tax. Every AI SaaS company must solve the same infrastructure problems:

Key management — Store, rotate, and isolate API keys per tenant. SOC2 compliance requires encrypted key storage, audit trails, and access controls.
Model selection & routing — Choose between GPT-4o, Claude Sonnet, Gemini Pro, and dozens of others. Implement fallback chains when providers rate-limit or go down.
Token metering — Track per-request token consumption across providers with different pricing models (per-token, per-image, per-second for audio).
Usage-based billing — Convert token consumption into customer invoices. Handle credits, overages, and hybrid subscription+usage models.
Cost passthrough — Manage the margin between upstream model cost and customer-facing price.

Quantifying wasted effort. Based on industry patterns:

A typical AI SaaS spends 2-4 engineer-months building key management, routing, and billing infrastructure before shipping any AI feature.
At average US SaaS engineer compensation (~$180K/year), that's $30K-$60K per company in pure infrastructure cost.
With 10,000+ AI startups, the aggregate waste exceeds $300M-$600M in duplicated engineering effort.
Enterprise LLM API spending has surged past $8.4B, with inference costs projected to reach $15B by end of 2026 (Maxim AI).

2. Current Solutions & Gaps

The current landscape fragments into several categories:

LLM Gateways & Routers

Tool	What It Does	What It Misses	Pricing	Traction	Funding
LiteLLM	Open-source Python proxy, 100+ providers, OpenAI-compatible API, per-team budgets, fallback chains	No UI, no billing, no BYOK for end-users, no context management, no payment	Free (OSS) / Managed cloud	~15K GitHub stars (PkgPulse)	Bootstrapped
Portkey	Enterprise AI gateway, semantic caching, guardrails, prompt versioning, advanced observability	No end-user BYOK, no billing/payment, no context marketplace, pricing opaqueness	Free tier (10K req/mo), paid plans	~8K GitHub stars (PkgPulse)	$18M total (Series A: $15M from Elevation Capital, Feb 2026) (Portkey blog, Tracxn)
OpenRouter	SaaS marketplace, 400+ models, one API key, pay-per-token, BYOK support (1M free BYOK req/mo)	No billing infrastructure, no context management, US-based only (GDPR concern), limited routing intelligence	5-15% markup over provider rates (PkgPulse)	~2K GitHub stars	Bootstrapped
Martian	Model router, dynamic routing, cost reduction 20-97%	Very narrow (routing only), no billing, no BYOK gateway, no context	Subscription-based (Dealroom)	Small	$9M seed (NEA, General Catalyst, Prosus Ventures) (HPCwire)
Helicone	LLM observability, cost tracking, caching, gateway	No billing, no end-user BYOK, no payment, no context management	$60+/mo (Truefoundry)	YC-backed, 2B+ LLM interactions processed (Helicone blog)	Y Combinator
Cloudflare AI Gateway	Free proxy, caching, logging, now dynamic routing (Aug 2025)	Limited routing intelligence, no billing, no context management	Free tier available (Truefoundry)	Massive via Cloudflare ecosystem	N/A (Cloudflare)

Cloud Provider Platforms

Platform	What It Does	What It Misses
Amazon Bedrock	Multi-model access, fine-tuning, guardrails, prompt routing	AWS lock-in, no billing for downstream customers, no BYOK from end-users, complex pricing (Truefoundry)
Azure AI Studio/Foundry	Model catalog, deployment, prompt flow, evaluation	Azure lock-in, enterprise-only focus, no consumer billing
Google Vertex AI	Model garden, endpoints, evaluation	GCP lock-in, complex pricing, no billing passthrough

Key Gap: Config File vs. Product

The critical gap is between infrastructure primitives (LiteLLM = config file, open-source, self-host) and production products (what doesn't exist yet). No current solution provides:

End-user BYOK — Most gateways manage their own keys. None let your end-users bring their API keys with per-tenant isolation.
Billing & payment — No LLM gateway includes usage-based billing or MoR payment processing.
Context-as-data — No routing platform treats prompts/context as versioned, tradeable data assets.
Integrated product — The four layers (gateway, routing, context, billing) require 4+ separate vendors today.

3. BYOK Landscape

What BYOK means in this context: Two distinct BYOK patterns exist:

Provider-side BYOK — The platform (OpenRouter, together.ai) lets you use your own provider API keys. OpenRouter stores your keys encrypted, offers 1M free BYOK requests/month, supports key priority/fallback, and model/member/api-key filters (OpenRouter BYOK docs). Together.ai and Anyscale focus on compute BYOK (bring your own GPU cluster).
End-user BYOK — The AI application lets its end-users plug in their own API keys. Tools like JetBrains IDEs (JetBrains blog), Warp terminal, SurfMind (SurfMind blog), ThinkForce, Chatbox AI, and Aymo AI support this.

Security implications:

Key storage: Keys must be encrypted at rest (AES-256), never logged, never exposed in error messages.
Key rotation: OpenRouter supports API key rotation with minimal downtime (OpenRouter docs).
Per-tenant isolation: Each user's keys must be isolated — no cross-tenant key leakage.
Compliance: SOC2 requires key management controls. GDPR applies to any key metadata stored in EU. PCI DSS if keys relate to payment.

Gap: No platform offers a BYOK gateway as a service — a hosted API where SaaS companies can let their end-users register provider keys, and the gateway handles encryption, isolation, rotation, and routing through those keys. This is the "Stripe for API keys" opportunity.

4. Model Routing State of the Art

Academic research:

Paper/Method	Approach	Key Results	Source
RouteLLM (LMSYS, 2024)	Preference-data-trained routers (similarity-weighted ranking, matrix factorization, BERT classifier, causal LLM classifier)	85%+ cost reduction on MT Bench at 95% GPT-4 quality. Outperforms commercial routers (Martian, Unify) by 40%+ cost savings	LMSYS Blog
FrugalGPT (2023)	Query-adaptive routing + prompt adaptation + caching	Significant cost reduction while maintaining quality	Awesome Routing LLMs
Hybrid LLM (ICLR 2024)	Router assigns queries to small/large model based on predicted difficulty	Cost-efficient quality-aware query routing	ICLR 2024
Cost-Aware Contrastive Routing (2025)	Prompt-specific cost-aware routing using contrastive learning	Addresses prompt-specific context in routing decisions	arXiv
NeuralUCB (2025)	Bandit algorithm for cost-aware routing	Balances quality and cost online	AlanHou blog
CARROT (2025)	Cost-Aware Rate Optimal Router	Optimal routing under cost constraints	ResearchGate
Adaptive Model & Strategy Routing (WWW 2025)	Combines model and strategy routing	Comprehensive routing framework	USTC paper
AttnTrace (2025)	Attention-based context traceback for long-context LLMs	Attribution of which context fragments influenced output	arXiv

Production implementations:

LiteLLM: Least-busy routing, round-robin, latency-based. Config-based, not ML-driven.
Portkey: Weighted load balancing, fallback chains, latency-aware routing. More sophisticated than LiteLLM but not ML-based.
OpenRouter: Provider ordering, fallback. Basic routing intelligence.
Martian: Claims dynamic ML-based routing, 20-97% cost reduction (Plug and Play).
Cloudflare AI Gateway: Dynamic routing added Aug 2025, confidence scores (Cloudflare blog).

The gap between research and product:

RouteLLM achieves 85% cost savings on benchmarks, but production routing is still rule-based (fallback chains, round-robin).
No commercial product offers preference-data-trained routing as a service.
Production routing must handle: multi-model fallback, rate limits, context window mismatches, cost budgets, latency SLAs — none of which academic papers address comprehensively.
The "data flywheel" (more routing decisions → better router) exists in research but not in any product.

Part 2: Context-as-Data

5. Prompt/Context Management Tools

Tool	Focus	Key Features	Pricing	Gap
LangSmith	LangChain tracing + prompts	Version tracking, execution logs, evaluation	Free tier, paid plans	LangChain-locked, no composable blocks, no marketplace
PromptLayer	Versioning & tracking	Log, version, A/B test prompts	Paid	No composable context, no marketplace
Humanloop	Non-engineer-friendly prompt management	Version control, evaluation workflows, UI for non-tech	Paid	No composable blocks, no marketplace
Langfuse	Open-source observability	A/B testing via prompt labeling, tracing	Open-source + cloud	No composable context blocks
Vellum	Enterprise AI development	Jinja templating, workflows, function calling, prompt engineering	Enterprise pricing (Vellum docs)	Enterprise-only, no marketplace
Braintrust	Evaluation-first	Test suites, scoring	Enterprise	No prompt marketplace
Parea	Prompt management	Testing, versioning	Paid	Small footprint

What exists for versioned, composable context blocks:

Langfuse allows labeled prompt versions (e.g., "prod-a", "prod-b") for A/B testing (Langfuse docs).
Vellum supports Jinja templating for dynamic prompts (Vellum docs).
Anthropic published "Effective Context Engineering for AI Agents" — treating context as a first-class engineering concern (Anthropic blog).

What's missing:

Composable context blocks — No tool lets you define reusable, versioned context fragments (e.g., "system prompt for legal summarization v2.3") that compose across applications.
Context performance metrics — No tool tracks which context fragments produce better outcomes.
Cross-organization sharing — No platform lets teams share proven context blocks with usage metrics.
Context marketplace — No marketplace for validated, performant prompts/context.

6. Context as an Asset Class

If prompts and context are versioned data assets with performance metrics, the marketplace opportunity is real:

Existing evidence:

The AI prompt marketplace was valued at $1,406M in 2024, projected to reach $10,992.4M by 2033 at 25.9% CAGR (Grand View Research).
PromptBase hosts 270,000+ prompts for sale (PromptBase).
PromptCow, Prompts-Market.com — emerging marketplaces for ChatGPT/Midjourney prompts (Reddit).

What a "context registry" would look like:

Context Block: legal-summarization-v2.3
├── Type: system-prompt
├── Model compatibility: claude-sonnet, gpt-4o
├── Performance metrics:
│   ├── ROUGE-L: 0.82 (n=1,200 evaluations)
│   ├── User satisfaction: 4.6/5 (n=340 ratings)
│   ├── Token efficiency: 847 avg output tokens
│   └── Latency: p95=1.2s
├── Version history: v1.0 → v2.3
├── Dependencies: [legal-glossary-v1.1, jurisdiction-filter-v3.0]
├── License: commercial / CC-BY-4.0
└── Price: $0.002 per invocation or $49/mo subscription

Marketplace dynamics:

Supply side: AI engineers create and validate context blocks, earn recurring revenue.
Demand side: SaaS companies buy proven context blocks instead of reinventing.
Platform moat: Performance data creates a quality signal that reputably ranks context blocks — a natural ranking mechanism.

7. Context Observability

Which context fragments actually influenced output?

This is an emerging research area:

AttnTrace (2025) — Attention-based context traceback for long-context LLMs. Can identify which context fragments influenced output and improve prompt injection detection (arXiv).
Feature attribution — Gradient-based and attention-based methods for attributing output to specific input tokens (Hugging Face blog).
LLM Observability design principles — ACM paper proposes Design for Awareness, Monitoring, Intervention, and Operability (ACM DL).

Token-level cost attribution:

Most observability platforms (Helicone, LangSmith, Langfuse) track token counts per request.
None attribute cost to which context fragment consumed tokens.
This matters when composable context blocks are assembled from multiple sources — who pays for the tokens?

Product landscape gap: No product offers "context fragment observability" — tracking which fragments of an assembled context actually influenced the output, enabling fair cost attribution and quality measurement.

Part 3: Payment & Billing for AI

8. Usage-Based Billing Infrastructure

Platform	Type	Token-Level Metering	Key Features	Gaps
Stripe Billing	Payment + billing	✅ Metered billing, per-token charges	Hybrid pricing (sub + usage), 40+ webhook events, excellent DX	Not MoR — you handle tax. Requires business entity in supported country. 2.9% + $0.30 per tx (Remery/Athenic)
Lago	Open-source billing	✅ Real-time metering, 1M events/sec	AGPLv3, 9,457 GitHub stars, hybrid pricing, full code ownership, self-hostable (Lago, ColdIQ)	No payment processing — billing engine only, needs Stripe/Paddle for collection
Metronome	Usage billing SaaS	✅ Event-driven metering	Enterprise-grade, entitlements, real-time rating	Expensive, enterprise-only, no self-hosting (Stigg)
Amberflo	Usage metering	✅ Purpose-built for metering	High-throughput event ingestion, real-time dashboards	Metering only — needs billing platform for invoicing
Orb	Usage billing	✅ SQL-based pricing, developer-first	Best DX for usage pricing, flexible rating	Newer, less enterprise validation (Orb)
Togai	Usage billing	✅ Event metering	Credit/grant systems, hybrid pricing	Smaller footprint

Which handle token-level metering well?

Lago: Best for full control + self-hosting. Open-source, real-time metering.
Stripe Billing: Best integration + DX. Metered billing API is mature.
Metronome/Orb: Best for enterprise usage billing.
Gap: None natively understand LLM tokens (input/output/context/rag distinction). All treat tokens as generic metered events.

9. MoR (Merchant of Record) for AI APIs

Platform	MoR?	Per-Token Billing	AI-Native Pricing	Key Features
Paddle	✅	Limited (basic usage-based)	❌	Handles all tax/compliance, 200+ countries, 5% + $0.50, subscription management (Paddle)
Lemon Squeezy	✅	❌ Very limited	❌	Simplest setup, no business entity required, 5% + $0.50, 135+ countries (Lemon Squeezy docs)
Dodo Payments	✅	✅ LLM ingestion blueprints	Partial	AI-specific MoR, handles token metering, supports OpenAI/Anthropic SDKs, Vietnam-friendly (Dodo Payments)
Gumroad	✅	❌	❌	Digital products only, not SaaS-friendly

The gap: No MoR natively handles per-token, multi-model billing. Paddle and Lemon Squeezy handle subscriptions well but struggle with usage-based AI pricing. Dodo Payments is the closest with LLM ingestion blueprints, but is still early-stage.

10. AI API Cost Passthrough

How AI SaaS companies handle model cost → customer billing:

Strategy	Description	Typical Margin	When It Works	Source
Pass-through	Charge at/near provider rate + small fee	<50% markup	Thin wrappers, sophisticated buyers	(Dodo Payments blog)
2x markup	Charge 2x the underlying model cost	2x	Modest engineering value above model	Same
3x markup	Standard SaaS margin	3x	Strong engineering value, sales motion needed	Same
4x+ premium	Premium product pricing	4x+	Substantial value beyond model, defensible	Same
Credit system	Pre-purchased credits, each worth N tokens	Variable	Developer tools, transparent	(Stripe AI pricing guide)
Flat subscription + usage overage	Base fee covers some usage, overages metered	2-3x on base, 1.5x on overage	Mature AI SaaS, protects margins	Same
Outcome-based	Charge per resolved ticket / generated lead	10-100x+	Vertical AI with measurable outcomes	(Bessemer Venture Partners)
Per-seat + usage	Seat fee + token consumption	Variable	Team products, enterprise	(Stripe)
Dynamic pricing	Price adjusts based on model cost in real-time	Variable	API marketplaces	(Software Pricing)
Token-tiered	Different per-token rates at different volumes	Declining margin at scale	High-volume API businesses	Industry pattern

Key insight from Bessemer's AI Pricing Playbook: "AI pricing strategy isn't like SaaS. Emerging AI business models price for outcomes, not access." (BVP)

11. Payment for Vietnam/No-US-Identity

The Vietnam problem:

Stripe is not officially available in Vietnam — requires foreign incorporation (Dodo Payments).
PayPal has faced compliance challenges in Vietnam (Vietnam News).
Vietnam's Foreign Contractor Tax (FCT) applies to remote digital service sales, combining VAT and income tax.
Vietnam's digital economy is poised to reach $49B by 2025 (Saigon Times via Dodo Payments).

MoR platforms for VN-based devs:

Platform	VN Support	Entity Required	Tax Handling	Payout
Paddle	✅ (can sell from VN)	No (MoR model)	Full tax compliance, 200+ countries	Bank transfer, 45+ currencies
Lemon Squeezy	✅	No (individual OK)	Full MoR, 135+ countries	PayPal, bank transfer
Dodo Payments	✅ (VN-specific blog)	No	Full MoR + FCT-aware	Bank transfer, multi-currency
Gumroad	✅	No	Partial MoR	PayPal

Can a solo dev in Vietnam collect global AI API payments without US entity? Yes, via MoR platforms (Paddle, Lemon Squeezy, Dodo Payments). They act as the legal seller, handle tax, and remit payouts. The trade-off is higher fees (5% vs Stripe's 2.9%) and limited usage-based billing support. Dodo Payments is the most AI-native option with LLM ingestion blueprints.

Tax implications for VN-based AI SaaS:

Vietnam charges 5% VAT on digital services. FCT combines VAT + CIT for foreign contractors.
MoR handles this on the buyer side (charging/remitting buyer's local taxes).
The VN developer still owes Vietnamese income tax on profits — typically 20% CIT for companies, or personal income tax (5-35% progressive) for individuals.
No double taxation treaty benefit for digital services in most cases.

Part 4: The Integrated Product

12. Product Architecture

AI Middleware-as-a-Service (AIMaaS) — Four integrated layers:

┌──────────────────────────────────────────────────────────┐
│                    CUSTOMER APPLICATION                    │
├──────────────────────────────────────────────────────────┤
│  LAYER 4: PAYMENT & BILLING                               │
│  ┌────────────┐ ┌───────────────┐ ┌──────────────────┐   │
│  │Metering     │ │Rating Engine  │ │MoR Integration   │   │
│  │Engine       │ │(pricing rules)│ │(Paddle/Dodo/Lago)│   │
│  └────────────┘ └───────────────┘ └──────────────────┘   │
├──────────────────────────────────────────────────────────┤
│  LAYER 3: CONTEXT REGISTRY                                │
│  ┌────────────┐ ┌───────────────┐ ┌──────────────────┐   │
│  │Versioned    │ │Performance    │ │Marketplace       │   │
│  │Context Store│ │Metrics        │ │(buy/sell blocks) │   │
│  └────────────┘ └───────────────┘ └──────────────────┘   │
├──────────────────────────────────────────────────────────┤
│  LAYER 2: INTELLIGENT ROUTER                               │
│  ┌────────────┐ ┌───────────────┐ ┌──────────────────┐   │
│  │Cost-Aware   │ │Latency-Aware │ │Fallback Engine    │   │
│  │ML Router    │ │Rules Engine   │ │(cascade chains)   │   │
│  └────────────┘ └───────────────┘ └──────────────────┘   │
├──────────────────────────────────────────────────────────┤
│  LAYER 1: BYOK GATEWAY                                     │
│  ┌────────────┐ ┌───────────────┐ ┌──────────────────┐   │
│  │Key Vault    │ │Per-Tenant     │ │OpenAI-Compatible │   │
│  │(HSM/AWS KMS)│ │Isolation      │ │API Surface       │   │
│  └────────────┘ └───────────────┘ └──────────────────┘   │
├──────────────────────────────────────────────────────────┤
│  MODEL PROVIDERS                                           │
│  [OpenAI] [Anthropic] [Google] [AWS Bedrock] [Azure] ...   │
└──────────────────────────────────────────────────────────┘

API Design:

# Core API surface (OpenAI-compatible + extensions)
POST /v1/chat/completions          # Standard completion (routes through BYOK or platform keys)
POST /v1/context/blocks            # Create versioned context block
GET  /v1/context/blocks/{id}       # Get block with performance metrics
POST /v1/context/assemble           # Assemble context from blocks → prompt
GET  /v1/metering/usage            # Token usage per tenant/context/model
POST /v1/billing/subscribe         # Subscribe customer to plan
GET  /v1/billing/invoice           # Get invoice with token-level breakdown
POST /v1/keys/register             # End-user registers provider key (BYOK)
POST /v1/keys/rotate               # Rotate a registered key
POST /v1/routing/config            # Configure routing rules + budgets

Data Model:

Tenant
├── ApiKey (provider keys, encrypted)
├── RoutingConfig (rules, budgets, fallbacks)
├── ContextBlocks[] (versioned prompts/fragments)
├── MeteringEvents[] (per-request token counts)
├── BillingAccount (plan, payment method)
└── Invoices[] (with token-level line items)

Deployment Architecture:

Gateway: Cloudflare Workers (edge, <50ms added latency) or Docker (self-hosted)
Router ML: Lightweight BERT classifier + matrix factorization (à la RouteLLM), trained on preference data, updated weekly
Key Vault: AWS KMS / GCP KMS for encryption, per-tenant key isolation
Metering: ClickHouse for high-throughput event storage (à la Helicone's architecture)
Billing Engine: Lago (open-source) as core, with MoR integrations (Paddle, Dodo Payments)
Context Store: PostgreSQL + S3 for versioned blocks, with search via vector embeddings

13. Competitive Moat Analysis

Moat Type	Strength	Details
Data moat	Strong	Routing performance data + context performance metrics compound over time. More traffic → better routing → more savings → more customers. This is a genuine flywheel.
Network effects	Moderate (two-sided)	Context marketplace: more producers → more consumers → more producers. BYOK gateway: more tenants → more provider integrations → more tenants.
Switching costs	Strong	Once a SaaS integrates the gateway + billing + context registry, migration requires re-implementing all four layers. High integration depth = high switching cost.
Integration depth	Strong	Each layer reinforces the others: BYOK keys → routing decisions → context assembly → metering → billing. Using one layer makes the others more valuable.
Open-source defense	Moderate	Open-source the gateway core (like LiteLLM) to commoditize routing primitives, monetize the integrated product (billing, context, BYOK management).

Defensibility assessment: The strongest moat is the data moat from routing + context performance data. No competitor can replicate the accumulated performance data from thousands of routing decisions and context block evaluations. This is the "Google PageRank for AI routing" opportunity.

14. Revenue Model

Revenue Stream	Mechanism	Projected % of Revenue	Year 1 Est.
Payment take rate	2-5% on payment flow through MoR integration	40-50%	$2-5M
Per-request routing fee	$0.0001-$0.001 per routed request (volume-tiered)	20-25%	$1-3M
Context registry subscription	$49-$499/mo for marketplace access + publishing	15-20%	$0.5-2M
Enterprise contracts	Custom pricing for high-volume, dedicated support	10-15%	$0.5-1M

Pricing scenarios:

Scenario	Customers	Avg Revenue/Customer	Total ARR
Conservative	500 (Y1)	$10K/yr	$5M
Base case	1,000 (Y1)	$12K/yr	$12M
Optimistic	2,000 (Y1)	$15K/yr	$30M

Breakdown (base case):

600 customers on payment flow: avg $8K/yr in take rate = $4.8M
800 customers on routing: avg 5M requests/mo × $0.0005 = $24K/yr = $19.2M total (but early customers will be smaller)
300 context registry subs: avg $150/mo = $540K
20 enterprise contracts: avg $50K/yr = $1M
Realistic Year 1: $5-15M ARR

15. Go-to-Market

Wedge strategy: BYOK Gateway first.

The BYOK gateway is the sharpest wedge because:

It solves an immediate, painful problem (end-user API key management).
It's the easiest layer to adopt independently (swap base URL → done).
It naturally leads to routing (once you have keys, route intelligently).
It creates the data flow needed for metering and billing.

Three-phase GTM:

Phase 1 (Months 1-6): Open-Source Gateway + BYOK

Open-source the BYOK gateway core (MIT license).
Cloud-hosted version with key management, rotation, per-tenant isolation.
Developer-first: one-line integration, OpenAI-compatible API.
Target: AI wrapper tools, IDE extensions, chat apps.
Revenue: $0 (open-source adoption play).

Phase 2 (Months 6-12): Routing + Metering

Add intelligent routing (ML-based, à la RouteLLM).
Add metering (token counts per user/model/context).
Target: AI SaaS companies with 10+ customers needing multi-model routing.
Revenue: Per-request routing fee + metering API.

Phase 3 (Months 12-18): Context + Billing

Add context registry (versioned blocks, performance metrics).
Add billing integration (Lago core + Paddle/Dodo MoR).
Add context marketplace (two-sided).
Target: Established AI SaaS needing billing + context management.
Revenue: Payment take rate + context subscriptions + enterprise contracts.

Why developer-first:

LiteLLM's 15K GitHub stars prove the developer demand for unified routing.
OpenRouter's 400+ model access proves the marketplace demand.
The gap is production-grade with billing — that's the monetization trigger.

16. Solo Dev Feasibility

Can one person build this?

Realistic assessment: The MVP yes, the full product no.

MVP (1 person, 3-6 months):

BYOK gateway with encrypted key storage, per-tenant isolation, OpenAI-compatible API
Basic routing (fallback chains + cost-based rules — not ML yet)
Token metering (per-request counts, stored in ClickHouse)
Basic billing (Stripe integration, credit-based system)
No context registry, no marketplace

What to open-source:

BYOK gateway core (community building, adoption)
Basic routing engine (commoditize the commodity)

What to monetize:

Hosted BYOK management (key rotation, isolation, SOC2 compliance)
Intelligent routing (ML models, the data flywheel)
Billing integration (the payment take rate is the revenue)
Context registry & marketplace

What requires a team (Phase 2+):

ML router training and maintenance
MoR integrations across jurisdictions
Context marketplace curation and quality control
Enterprise sales and support

Vietnam-based solo dev targeting global market:

✅ Low cost of living = longer runway
✅ MoR platforms (Paddle, Dodo Payments) solve the payment/identity problem
✅ No US entity needed for MoR-based sales
⚠️ Time zone challenges for US enterprise customers
⚠️ Limited access to US VC networks (but not required for bootstrapping)
⚠️ Stripe unavailable directly — must use MoR

17. Risks

Risk	Severity	Mitigation
Provider consolidation — Major labs (OpenAI, Anthropic, Google) build their own routing/billing	High	Open-source core + integrated product. When providers consolidate, they consolidate routing, not end-user BYOK management or context registries. The BYOK wedge is provider-agnostic by definition.
Open-source commoditization — LiteLLM improves, adds billing, becomes "good enough"	Medium	LiteLLM is a config file, not a product. The gap between config and product (key management, billing, MoR, context) is enormous. Stay ahead on integrated value.
Regulatory — MiFID-style regulation for AI APIs (obligation to best execute)	Low (currently)	Monitor EU AI Act developments. If routing regulation emerges, compliance becomes a barrier to entry (helps incumbents).
Payment compliance complexity — Tax laws change per jurisdiction	Medium	MoR partners (Paddle, Dodo) absorb this risk. Don't become a payment processor yourself.
BYOK security breach — Key leakage = existential trust failure	High	HSM/AWS KMS for encryption. Penetration testing. Bug bounty. SOC2 from day 1. Insurance.
Context marketplace quality — Low-quality blocks erode trust	Medium	Curation + performance metrics + community rating. Require minimum evaluation count before marketplace listing.
Competition from Cloudflare — Free AI Gateway + Workers AI could add billing	High	Cloudflare is the biggest threat. They have the distribution. Differentiate on: end-user BYOK, context registry, MoR integration, ML routing. Cloudflare won't build MoR.
Solo dev scaling risk — Burnout, single point of failure	Medium	Phase 1 is solo-feasible. Phase 2 needs at least 1-2 hires. Revenue from Phase 1 funds Phase 2.

Conclusion

The AI Middleware-as-a-Service opportunity sits at the intersection of four fragmented markets: LLM gateways ($8.4B+ in API spend), prompt management ($1.4B marketplace), usage-based billing (multiple $100M+ ARR companies), and MoR payments ($B+ market). No existing product spans all four layers. The proposed platform — starting with an open-source BYOK gateway wedge and expanding into routing, context, and billing — addresses a real and growing pain point for the 10,000+ AI SaaS companies reinventing this infrastructure. The data moat from routing + context performance is the strongest defensible advantage. A Vietnam-based solo developer can build the MVP and reach initial revenue via MoR platforms, though scaling past $1M ARR requires a team.

Sources

Exploding Topics — "How Many AI Companies Are There?" — https://explodingtopics.com/blog/number-ai-companies
Coherent Market Insights — AI Created SaaS Market — https://www.coherentmarketinsights.com/industry-reports/ai-created-saas-market
Maxim AI — "Top AI Gateways to Reduce LLM Cost and Latency" — https://www.getmaxim.ai/articles/top-ai-gateways-to-reduce-llm-cost-and-latency/
PkgPulse — "Portkey vs LiteLLM vs OpenRouter: LLM Gateway 2026" — https://www.pkgpulse.com/guides/portkey-vs-litellm-vs-openrouter-llm-gateway-2026
Portkey Blog — "Series A Funding" — https://portkey.ai/blog/series-a-funding
Tracxn — Portkey Profile — https://tracxn.com/d/companies/portkey/__ZBFkMQ22qjERQNfNQH39gbt9Y3bf72VJNqiydQkp6qU
LMSYS — "RouteLLM: An Open-Source Framework for Cost-Effective LLM Routing" — https://lmsys.org/blog/2024-07-01-routellm/
Helicone — "Complete Guide to LLM Observability Platforms" — https://www.helicone.ai/blog/the-complete-guide-to-LLM-observability-platforms
OpenRouter — BYOK Documentation — https://openrouter.ai/docs/guides/overview/auth/byok
SurfMind — "BYOK Explained" — https://surfmind.ai/blog/byok-bring-your-own-key-future-of-ai-tools
JetBrains Blog — "BYOK Now Live" — https://blog.jetbrains.com/ai/2025/12/bring-your-own-key-byok-is-now-live-in-jetbrains-ides/
Stripe — "AI SaaS Pricing Models" — https://stripe.com/resources/more/ai-saas-pricing-models
Dodo Payments — "Claude Code and Margin Pass Through" — https://dodopayments.com/blogs/claude-code-margin-pass-through
Dodo Payments — "Merchant of Record in Vietnam" — https://dodopayments.com/blogs/merchant-of-record-vietnam
Remery/Athenic — "Stripe vs Paddle vs Lemon Squeezy" — https://getathenic.com/blog/stripe-vs-paddle-vs-lemon-squeezy-saas-billing
Lemon Squeezy Docs — Supported Countries — https://docs.lemonsqueezy.com/help/getting-started/supported-countries
Lago — Open-Source Billing Infrastructure — https://getlago.com/
ColdIQ — "Hyperline vs Metronome vs Lago vs Orb" — https://coldiq.com/blog/hyperline-vs-metronome-vs-lago-vs-orb-which-billing-platform-handles-subscription-usage-pricing-best
Grand View Research — AI Prompt Marketplace Market Report — https://www.grandviewresearch.com/industry-analysis/artificial-intelligence-ai-prompt-marketplace-market-report
PromptBase — AI Prompt Marketplace — https://promptbase.com/
arXiv — AttnTrace: Attention-based Context Traceback — https://arxiv.org/html/2508.03793v1
Anthropic — "Effective Context Engineering for AI Agents" — https://www.anthropic.com/engineering/effective-context-engineering-for-ai-agents
Langfuse — A/B Testing Documentation — https://langfuse.com/docs/prompt-management/features/a-b-testing
Vellum — Prompt Engineering Documentation — https://docs.vellum.ai/product/prompts/prompt-engineering
Martian — Website — https://withmartian.com/
HPCwire — "Martian Raises $9M" — https://www.hpcwire.com/bigdatawire/this-just-in/martian-raises-9m-for-advanced-model-mapping-to-enhance-llm-performance-and-accuracy/
Cloudflare Blog — "AI Gateway Aug 2025 Refresh" — https://blog.cloudflare.com/ai-gateway-aug-2025-refresh/
BVP — "The AI Pricing and Monetization Playbook" — https://www.bvp.com/atlas/the-ai-pricing-and-monetization-playbook
arXiv — Cost-Aware Contrastive Routing — https://arxiv.org/html/2508.12491v1
GitHub — Awesome Routing LLMs — https://github.com/MilkThink-Lab/Awesome-Routing-LLMs
SEO.ai — "How Many SaaS Companies Are There" — https://seo.ai/blog/how-many-saas-companies-are-there
BetterCloud — "The Big List of 2026 SaaS Statistics" — https://www.bettercloud.com/monitor/saas-statistics/