feat(catalogue,gateway): model aliases (helexa/small, helexa/balanced, helexa/large)

Operators can now define tier aliases in models.toml: [aliases] "helexa/small" = "Qwen/Qwen3-1.7B" "helexa/balanced" = "Qwen/Qwen3-8B" "helexa/large" = "Qwen/Qwen3.6-27B" A client request for `model: "helexa/small"` is resolved to the concrete model id at routing time. The gateway also rewrites the proxied body's `model` field to the concrete id so neuron sees a name that matches its loaded handle (otherwise the harness rejects the request). Motivated by the finger-in-the-wind benchmark: same "what's the capital of Georgia" probe runs in 2.5s on the 1.7B vs 6.7s on the 27B with identical correctness. Aliases let clients pick a latency tier without hardcoding model ids, and let operators swap targets without changing client code. Changes: * cortex-core: `ModelCatalogue` gains `aliases: HashMap<String, String>` + `resolve_alias(&str) -> &str`. Unit tests cover the basic resolution + TOML round-trip. * cortex-gateway: * `RouteDecision` gains `resolved_model_id: String`. `router::resolve` consumes aliases at entry and threads the concrete id through. * Handlers (chat_completions, completions, anthropic_messages streaming + non-streaming) rewrite the body's `model` field with `rewrite_model_in_body` before proxying, using the resolved id for metrics labels, LRU touch, and the body itself. * `/v1/models` (Pass 4) emits each alias as its own entry mirroring the target's `loaded` flag, feasible_on, and locations — clients browsing the endpoint see both names and can pick either. * `models.toml` declares the three tier aliases; `models.example.toml` documents the section as opt-in. * Integration tests verify: end-to-end alias→concrete request flow, alias surfacing in /v1/models, and no-op fall-through for non-alias model ids. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-26 16:10:41 +03:00
parent becf61b9c1
commit 24e20dcb5c
5 changed files with 426 additions and 7 deletions
--- a/crates/cortex-core/src/catalogue.rs
+++ b/crates/cortex-core/src/catalogue.rs
@@ -2,6 +2,7 @@

 use crate::discovery::DeviceInfo;
 use serde::{Deserialize, Serialize};
+use std::collections::HashMap;
 use std::path::Path;

 /// A model serving profile loaded from models.toml.
@@ -34,6 +35,14 @@ fn default_min_devices() -> u32 {
 pub struct ModelCatalogue {
    #[serde(default)]
    pub models: Vec<ModelProfile>,
+    /// Tier aliases — clients can send a request with `model: "helexa/small"`
+    /// and the gateway transparently rewrites + routes to the concrete
+    /// model id this maps to. Lets operators define latency/quality
+    /// tiers (`small`/`balanced`/`large`, `fast`/`thinking`, etc.)
+    /// without imposing knowledge of specific model ids on clients.
+    /// Loaded from the `[aliases]` table in models.toml.
+    #[serde(default)]
+    pub aliases: HashMap<String, String>,
 }

 impl ModelCatalogue {
@@ -70,6 +79,13 @@ impl ModelCatalogue {
    pub fn get(&self, model_id: &str) -> Option<&ModelProfile> {
        self.models.iter().find(|p| p.id == model_id)
    }
+
+    /// Resolve an alias to its concrete model id. Returns `id` verbatim
+    /// when it isn't an alias. Aliases never chain — operator config
+    /// is treated as flat — so this is a single lookup.
+    pub fn resolve_alias<'a>(&'a self, id: &'a str) -> &'a str {
+        self.aliases.get(id).map(String::as_str).unwrap_or(id)
+    }
 }

 impl ModelProfile {
@@ -164,4 +180,32 @@ mod tests {
        let devices = [device(0, 1_000), device(1, 1_000)];
        assert!(p.is_feasible_on("anywhere", &devices));
    }
+
+    #[test]
+    fn resolve_alias_returns_target_when_alias_present() {
+        let mut cat = ModelCatalogue::default();
+        cat.aliases
+            .insert("helexa/small".into(), "Qwen/Qwen3-1.7B".into());
+        assert_eq!(cat.resolve_alias("helexa/small"), "Qwen/Qwen3-1.7B");
+    }
+
+    #[test]
+    fn resolve_alias_passes_through_when_not_an_alias() {
+        let mut cat = ModelCatalogue::default();
+        cat.aliases
+            .insert("helexa/small".into(), "Qwen/Qwen3-1.7B".into());
+        assert_eq!(cat.resolve_alias("Qwen/Qwen3-8B"), "Qwen/Qwen3-8B");
+    }
+
+    #[test]
+    fn aliases_table_round_trips_through_toml() {
+        let src = r#"
+[aliases]
+"helexa/small" = "Qwen/Qwen3-1.7B"
+"helexa/large" = "Qwen/Qwen3.6-27B"
+"#;
+        let cat: ModelCatalogue = toml::from_str(src).expect("parse aliases table");
+        assert_eq!(cat.resolve_alias("helexa/small"), "Qwen/Qwen3-1.7B");
+        assert_eq!(cat.resolve_alias("helexa/large"), "Qwen/Qwen3.6-27B");
+    }
 }