feat(deploy): per-host neuron config + pre-warm headline models
All checks were successful
CI / Format (push) Successful in 39s
build-prerelease / Resolve version stamps (push) Successful in 40s
CI / Clippy (push) Successful in 2m17s
CI / Test (push) Successful in 4m57s
CI / Build cortex SRPM (push) Has been skipped
CI / Build neuron SRPM (push) Has been skipped
CI / Publish cortex to COPR (push) Has been skipped
CI / Publish neuron to COPR (push) Has been skipped
CI / Bump version in source (push) Has been skipped
build-prerelease / Build neuron-blackwell (push) Successful in 3m50s
build-prerelease / Build cortex binary (push) Successful in 4m52s
build-prerelease / Package cortex RPM (push) Successful in 1m22s
build-prerelease / Build neuron-ampere (push) Successful in 5m13s
build-prerelease / Build neuron-ada (push) Successful in 5m14s
build-prerelease / Package helexa-neuron-ampere RPM (push) Successful in 2m53s
build-prerelease / Package helexa-neuron-ada RPM (push) Successful in 2m55s
build-prerelease / Package helexa-neuron-blackwell RPM (push) Successful in 3m45s
build-prerelease / Publish to rpm.lair.cafe (unstable) (push) Successful in 1m1s
All checks were successful
CI / Format (push) Successful in 39s
build-prerelease / Resolve version stamps (push) Successful in 40s
CI / Clippy (push) Successful in 2m17s
CI / Test (push) Successful in 4m57s
CI / Build cortex SRPM (push) Has been skipped
CI / Build neuron SRPM (push) Has been skipped
CI / Publish cortex to COPR (push) Has been skipped
CI / Publish neuron to COPR (push) Has been skipped
CI / Bump version in source (push) Has been skipped
build-prerelease / Build neuron-blackwell (push) Successful in 3m50s
build-prerelease / Build cortex binary (push) Successful in 4m52s
build-prerelease / Package cortex RPM (push) Successful in 1m22s
build-prerelease / Build neuron-ampere (push) Successful in 5m13s
build-prerelease / Build neuron-ada (push) Successful in 5m14s
build-prerelease / Package helexa-neuron-ampere RPM (push) Successful in 2m53s
build-prerelease / Package helexa-neuron-ada RPM (push) Successful in 2m55s
build-prerelease / Package helexa-neuron-blackwell RPM (push) Successful in 3m45s
build-prerelease / Publish to rpm.lair.cafe (unstable) (push) Successful in 1m1s
Adds asset/neuron/{beast,benjy,quadbrat}.toml — per-host neuron.toml
files keyed by the first dot-component of the host. deploy.sh now
rsyncs the matching file to /etc/neuron/neuron.toml on each neuron and
stops+starts the service so default_models is re-read.
Headline model per host (drives /v1/models output immediately after a
clean deploy):
beast Qwen/Qwen3.6-27B (q5k, tp=2, devices=[0,1])
benjy Qwen/Qwen3-8B (bf16, devices=[0])
quadbrat Qwen/Qwen3-1.7B (bf16, devices=[0])
Removes the need to follow deploy.sh with `validate-neuron.sh beast
Qwen/Qwen3.6-27B q5k 2` to surface the 27B in the catalogue — the
neuron loads it itself on activation.
The neuron loop now mirrors the cortex flow (stop → install/upgrade →
sync config → start) so config-only changes pick up on subsequent
deploys; previously a no-package-change deploy would silently leave
the host on the old default_models.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
24
asset/neuron/beast.toml
Normal file
24
asset/neuron/beast.toml
Normal file
@@ -0,0 +1,24 @@
|
||||
# neuron.toml for beast.hanzalova.internal
|
||||
#
|
||||
# 2x RTX 5090 (32 GB each) — TP-2 capable. Pre-warms Qwen3.6-27B with
|
||||
# q5k ISQ across both GPUs at activation, matching the validate-neuron
|
||||
# invocation: `validate-neuron.sh beast.hanzalova.internal
|
||||
# Qwen/Qwen3.6-27B q5k 2`.
|
||||
#
|
||||
# Synced by script/deploy.sh from asset/neuron/<short-host>.toml. Edits
|
||||
# take effect on the next deploy.sh run (which stops + restarts the
|
||||
# service so default_models is re-read at activation).
|
||||
|
||||
port = 13131
|
||||
|
||||
[[harnesses]]
|
||||
name = "candle"
|
||||
|
||||
[harness.candle]
|
||||
|
||||
[[default_models]]
|
||||
model_id = "Qwen/Qwen3.6-27B"
|
||||
harness = "candle"
|
||||
quant = "q5k"
|
||||
tensor_parallel = 2
|
||||
devices = [0, 1]
|
||||
19
asset/neuron/benjy.toml
Normal file
19
asset/neuron/benjy.toml
Normal file
@@ -0,0 +1,19 @@
|
||||
# neuron.toml for benjy.hanzalova.internal
|
||||
#
|
||||
# 1x RTX 4090 (24 GB) — largest single-GPU host on the fleet. Pre-warms
|
||||
# Qwen3-8B (bf16, ~18 GB), leaving ~6 GB for KV cache + activations on
|
||||
# moderate-length contexts.
|
||||
#
|
||||
# Synced by script/deploy.sh from asset/neuron/<short-host>.toml.
|
||||
|
||||
port = 13131
|
||||
|
||||
[[harnesses]]
|
||||
name = "candle"
|
||||
|
||||
[harness.candle]
|
||||
|
||||
[[default_models]]
|
||||
model_id = "Qwen/Qwen3-8B"
|
||||
harness = "candle"
|
||||
devices = [0]
|
||||
19
asset/neuron/quadbrat.toml
Normal file
19
asset/neuron/quadbrat.toml
Normal file
@@ -0,0 +1,19 @@
|
||||
# neuron.toml for quadbrat.hanzalova.internal
|
||||
#
|
||||
# 1x RTX 3060 (12 GB) — small / quantised tier. Pre-warms Qwen3-1.7B
|
||||
# (bf16, ~4 GB), leaving ~7 GB for KV cache so long contexts on a small
|
||||
# model still have plenty of room.
|
||||
#
|
||||
# Synced by script/deploy.sh from asset/neuron/<short-host>.toml.
|
||||
|
||||
port = 13131
|
||||
|
||||
[[harnesses]]
|
||||
name = "candle"
|
||||
|
||||
[harness.candle]
|
||||
|
||||
[[default_models]]
|
||||
model_id = "Qwen/Qwen3-1.7B"
|
||||
harness = "candle"
|
||||
devices = [0]
|
||||
Reference in New Issue
Block a user