First end-to-end run of the deploy workflow succeeded (gitea run #289), so the operator-run rolling-deploy script and its YAML manifest are no longer the source of truth — fleet topology lives in .gitea/workflows/deploy.yml and per-host config in script/infra-setup.sh. Per-host neuron config comments updated to point at the new sync path. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
20 lines
435 B
TOML
20 lines
435 B
TOML
# neuron.toml for quadbrat.hanzalova.internal
|
|
#
|
|
# 1x RTX 3060 (12 GB) — small / quantised tier. Pre-warms Qwen3-1.7B
|
|
# (bf16, ~4 GB), leaving ~7 GB for KV cache so long contexts on a small
|
|
# model still have plenty of room.
|
|
#
|
|
# Synced to /etc/neuron/neuron.toml by script/infra-setup.sh.
|
|
|
|
port = 13131
|
|
|
|
[[harnesses]]
|
|
name = "candle"
|
|
|
|
[harness.candle]
|
|
|
|
[[default_models]]
|
|
model_id = "Qwen/Qwen3-1.7B"
|
|
harness = "candle"
|
|
devices = [0]
|