Core Relay System

1 Overview

The Core Relay System is the execution backbone of the LLMs x420 Gateway. It orchestrates the full lifecycle of AI service requests — from authentication, request parsing, intelligent routing, upstream communication, to response delivery — across diverse LLM providers such as OpenAI, Claude, Gemini, as well as multimodal services including image and audio generation.

Its mission: to provide a standardized, OpenAI-compatible gateway that dynamically integrates multiple upstream intelligence providers via programmable routing, quota enforcement, and resilient middleware.

2 Architecture Design Principles

The system follows a layered middleware architecture, where each processing layer is responsible for a specific operational concern:

Authentication & Context Loading
Channel Routing & Load Distribution
API Format Normalization
Token Accounting & Quota Enforcement
Fault Tolerance & Retry
Format-Specific Dispatching
Logging & Real-Time Response

This modularity ensures maximum composability, observability, and resilience — ideal for high-throughput agent-driven applications.

Core Design Tenets:

Format Agnosticism: Seamless translation across OpenAI, Claude, Gemini, and custom schema
Pessimistic Quota Control: Reserve-before-call model to guarantee cost containment
Smart Retry Logic: Built-in failover, adaptive retries, and channel auto-healing

3 Authentication Layer

The gateway supports multi-modal credentialing including:

Bearer Tokens (OpenAI-compatible)
WebSocket session headers
Claude API keys
Gemini credentials
Midjourney access secrets

Once authenticated, the system loads user metadata (e.g. quota class, allowed models, IP policy) and resolves access rights. This lays the groundwork for:

Dynamic access control
Group-level feature gating
Quota scope enforcement
Billing traceability

4 Channel Distribution & Load Routing

The distribution layer handles channel selection based on:

Token configuration (explicit channels)
Weighted round-robin (implicit channels)
Model availability
Group permissions
Real-time channel health scores

During retries, the system excludes previously failed channels and deterministically selects alternatives. This adaptive routing ensures:

Graceful degradation
Intelligent failover
Avoidance of retry loops

Source: distributor.go:28–123

5 Format Parsing & Validation

Supports native parsing for:

OpenAI (chat/completion/embedding)
Claude (messages, system prompts)
Gemini (generation, embedding)
Suno, Midjourney, Audio-to-Text (specialized schemas)

Each format is converted into a unified internal structure: RelayInfo. This holds all relevant metadata:

Model alias
Streaming flags
Token budget
Channel hints
Logging trace ID

6 Token & Cost Computation

The gateway performs pre-consumption estimation using model-specific tokenizers and modality-aware counting:

Text: Tokenized with per-model overhead handling
Images: Counted via dimension × quality heuristics
Audio: Duration-based pricing
Tools: Accounted as extra token blocks

Final cost is determined by:

Model price profile
User multiplier (e.g., developer tier)
Ratio of prompt vs completion tokens

This ensures quota safety and billing precision before hitting upstream APIs.

7 Quota Enforcement

Quota enforcement uses a pessimistic strategy:

Pre-consumption: Deduct quota based on estimation
Deferred Reconciliation: On failure, quota is refunded
Post-call Adjustment: Real usage reconciled post-response

Benefits:

Prevents runaway calls
Ensures billing consistency
Supports granular user-level metering

8 Intelligent Retry Engine

Retries are handled based on error classification:

Error Type

Retry Behavior

5xx / 429

Retry w/ new route

Timeout

Retry once

4xx

Abort fast

Channel-specific errors trigger:

Logging to channel health registry
Optional automatic disablement
Alternate channel selection on retry

This provides self-healing, high-availability behavior under partial outages.

9 Format-Specific Handlers

Requests are routed to dedicated adaptors:

Format

Handler Function

OpenAI

WebSocket (realtime) + REST

Claude

Prompt rewriting, “thinking” mode

Gemini

Embedding / generation dispatch

Image

Midjourney relay, callback logic

Audio

Whisper / TTS handler

Adaptors abstract upstream calls and normalize responses into OpenAI format before streaming back to the client.

10 Error Handling & Channel Stability

Error logs include request ID, trace path, payload shape, channel metrics
Automatic disabling thresholds prevent cascading channel failures
Re-enablement requires health check or admin override
Stats feed into performance dashboards and pricing policies

11 Streaming & Logging

Streaming responses are pipelined directly to the client with <150ms delay
Token, cost, duration, and channel metadata are recorded
Logs are shipped to internal analytics and billing engine for reconciliation

12 Specialized Relay Modes

The system supports asynchronous workflows for advanced use cases:

Midjourney: Image task queuing, status polling, retry
Suno / Video: Task-style async relay with callback registration
Tool Calls: Multimodal request proxying and function chaining

13 Summary

The Core Relay System is more than a gateway — it's an intelligent execution graph that abstracts away multi-provider complexity, enforces quota integrity, and optimizes request outcomes.

Built for production-scale agent networks, its modular middleware enables LLMs x420 to act as a programmable compute marketplace — one where access, pricing, and reliability are enforced at the protocol level.

Previousx402 LLM Services NextModel Logging & Analytics System

Last updated 3 hours ago