feat(catalogue,gateway): model aliases (helexa/small, helexa/balanced, helexa/large)
All checks were successful
build-prerelease / Resolve version stamps (push) Successful in 39s
CI / Format (push) Successful in 40s
CI / Clippy (push) Successful in 2m21s
CI / Test (push) Successful in 4m40s
build-prerelease / Build neuron-blackwell (push) Successful in 3m38s
CI / Build cortex SRPM (push) Has been skipped
CI / Build neuron SRPM (push) Has been skipped
CI / Publish cortex to COPR (push) Has been skipped
CI / Publish neuron to COPR (push) Has been skipped
CI / Bump version in source (push) Has been skipped
build-prerelease / Build cortex binary (push) Successful in 4m19s
build-prerelease / Package cortex RPM (push) Successful in 1m21s
build-prerelease / Build neuron-ampere (push) Successful in 5m20s
build-prerelease / Build neuron-ada (push) Successful in 4m45s
build-prerelease / Package helexa-neuron-ampere RPM (push) Successful in 2m59s
build-prerelease / Package helexa-neuron-ada RPM (push) Successful in 3m10s
build-prerelease / Package helexa-neuron-blackwell RPM (push) Successful in 9m40s
build-prerelease / Publish to rpm.lair.cafe (unstable) (push) Successful in 1m3s
All checks were successful
build-prerelease / Resolve version stamps (push) Successful in 39s
CI / Format (push) Successful in 40s
CI / Clippy (push) Successful in 2m21s
CI / Test (push) Successful in 4m40s
build-prerelease / Build neuron-blackwell (push) Successful in 3m38s
CI / Build cortex SRPM (push) Has been skipped
CI / Build neuron SRPM (push) Has been skipped
CI / Publish cortex to COPR (push) Has been skipped
CI / Publish neuron to COPR (push) Has been skipped
CI / Bump version in source (push) Has been skipped
build-prerelease / Build cortex binary (push) Successful in 4m19s
build-prerelease / Package cortex RPM (push) Successful in 1m21s
build-prerelease / Build neuron-ampere (push) Successful in 5m20s
build-prerelease / Build neuron-ada (push) Successful in 4m45s
build-prerelease / Package helexa-neuron-ampere RPM (push) Successful in 2m59s
build-prerelease / Package helexa-neuron-ada RPM (push) Successful in 3m10s
build-prerelease / Package helexa-neuron-blackwell RPM (push) Successful in 9m40s
build-prerelease / Publish to rpm.lair.cafe (unstable) (push) Successful in 1m3s
Operators can now define tier aliases in models.toml:
[aliases]
"helexa/small" = "Qwen/Qwen3-1.7B"
"helexa/balanced" = "Qwen/Qwen3-8B"
"helexa/large" = "Qwen/Qwen3.6-27B"
A client request for `model: "helexa/small"` is resolved to the concrete
model id at routing time. The gateway also rewrites the proxied body's
`model` field to the concrete id so neuron sees a name that matches its
loaded handle (otherwise the harness rejects the request).
Motivated by the finger-in-the-wind benchmark: same "what's the capital
of Georgia" probe runs in 2.5s on the 1.7B vs 6.7s on the 27B with
identical correctness. Aliases let clients pick a latency tier without
hardcoding model ids, and let operators swap targets without changing
client code.
Changes:
* cortex-core: `ModelCatalogue` gains `aliases: HashMap<String, String>`
+ `resolve_alias(&str) -> &str`. Unit tests cover the basic
resolution + TOML round-trip.
* cortex-gateway:
* `RouteDecision` gains `resolved_model_id: String`. `router::resolve`
consumes aliases at entry and threads the concrete id through.
* Handlers (chat_completions, completions, anthropic_messages
streaming + non-streaming) rewrite the body's `model` field with
`rewrite_model_in_body` before proxying, using the resolved id
for metrics labels, LRU touch, and the body itself.
* `/v1/models` (Pass 4) emits each alias as its own entry mirroring
the target's `loaded` flag, feasible_on, and locations — clients
browsing the endpoint see both names and can pick either.
* `models.toml` declares the three tier aliases; `models.example.toml`
documents the section as opt-in.
* Integration tests verify: end-to-end alias→concrete request flow,
alias surfacing in /v1/models, and no-op fall-through for
non-alias model ids.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -2,6 +2,7 @@
|
||||
|
||||
use crate::discovery::DeviceInfo;
|
||||
use serde::{Deserialize, Serialize};
|
||||
use std::collections::HashMap;
|
||||
use std::path::Path;
|
||||
|
||||
/// A model serving profile loaded from models.toml.
|
||||
@@ -34,6 +35,14 @@ fn default_min_devices() -> u32 {
|
||||
pub struct ModelCatalogue {
|
||||
#[serde(default)]
|
||||
pub models: Vec<ModelProfile>,
|
||||
/// Tier aliases — clients can send a request with `model: "helexa/small"`
|
||||
/// and the gateway transparently rewrites + routes to the concrete
|
||||
/// model id this maps to. Lets operators define latency/quality
|
||||
/// tiers (`small`/`balanced`/`large`, `fast`/`thinking`, etc.)
|
||||
/// without imposing knowledge of specific model ids on clients.
|
||||
/// Loaded from the `[aliases]` table in models.toml.
|
||||
#[serde(default)]
|
||||
pub aliases: HashMap<String, String>,
|
||||
}
|
||||
|
||||
impl ModelCatalogue {
|
||||
@@ -70,6 +79,13 @@ impl ModelCatalogue {
|
||||
pub fn get(&self, model_id: &str) -> Option<&ModelProfile> {
|
||||
self.models.iter().find(|p| p.id == model_id)
|
||||
}
|
||||
|
||||
/// Resolve an alias to its concrete model id. Returns `id` verbatim
|
||||
/// when it isn't an alias. Aliases never chain — operator config
|
||||
/// is treated as flat — so this is a single lookup.
|
||||
pub fn resolve_alias<'a>(&'a self, id: &'a str) -> &'a str {
|
||||
self.aliases.get(id).map(String::as_str).unwrap_or(id)
|
||||
}
|
||||
}
|
||||
|
||||
impl ModelProfile {
|
||||
@@ -164,4 +180,32 @@ mod tests {
|
||||
let devices = [device(0, 1_000), device(1, 1_000)];
|
||||
assert!(p.is_feasible_on("anywhere", &devices));
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn resolve_alias_returns_target_when_alias_present() {
|
||||
let mut cat = ModelCatalogue::default();
|
||||
cat.aliases
|
||||
.insert("helexa/small".into(), "Qwen/Qwen3-1.7B".into());
|
||||
assert_eq!(cat.resolve_alias("helexa/small"), "Qwen/Qwen3-1.7B");
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn resolve_alias_passes_through_when_not_an_alias() {
|
||||
let mut cat = ModelCatalogue::default();
|
||||
cat.aliases
|
||||
.insert("helexa/small".into(), "Qwen/Qwen3-1.7B".into());
|
||||
assert_eq!(cat.resolve_alias("Qwen/Qwen3-8B"), "Qwen/Qwen3-8B");
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn aliases_table_round_trips_through_toml() {
|
||||
let src = r#"
|
||||
[aliases]
|
||||
"helexa/small" = "Qwen/Qwen3-1.7B"
|
||||
"helexa/large" = "Qwen/Qwen3.6-27B"
|
||||
"#;
|
||||
let cat: ModelCatalogue = toml::from_str(src).expect("parse aliases table");
|
||||
assert_eq!(cat.resolve_alias("helexa/small"), "Qwen/Qwen3-1.7B");
|
||||
assert_eq!(cat.resolve_alias("helexa/large"), "Qwen/Qwen3.6-27B");
|
||||
}
|
||||
}
|
||||
|
||||
Reference in New Issue
Block a user