First end-to-end run of the deploy workflow succeeded (gitea run #289), so the operator-run rolling-deploy script and its YAML manifest are no longer the source of truth — fleet topology lives in .gitea/workflows/deploy.yml and per-host config in script/infra-setup.sh. Per-host neuron config comments updated to point at the new sync path. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
20 lines
425 B
TOML
20 lines
425 B
TOML
# neuron.toml for benjy.hanzalova.internal
|
|
#
|
|
# 1x RTX 4090 (24 GB) — largest single-GPU host on the fleet. Pre-warms
|
|
# Qwen3-8B (bf16, ~18 GB), leaving ~6 GB for KV cache + activations on
|
|
# moderate-length contexts.
|
|
#
|
|
# Synced to /etc/neuron/neuron.toml by script/infra-setup.sh.
|
|
|
|
port = 13131
|
|
|
|
[[harnesses]]
|
|
name = "candle"
|
|
|
|
[harness.candle]
|
|
|
|
[[default_models]]
|
|
model_id = "Qwen/Qwen3-8B"
|
|
harness = "candle"
|
|
devices = [0]
|