chore(deploy): drop deploy.sh and manifest.yml now that workflow runs
First end-to-end run of the deploy workflow succeeded (gitea run #289), so the operator-run rolling-deploy script and its YAML manifest are no longer the source of truth — fleet topology lives in .gitea/workflows/deploy.yml and per-host config in script/infra-setup.sh. Per-host neuron config comments updated to point at the new sync path. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This commit is contained in:
@@ -1,30 +0,0 @@
|
||||
# Helexa fleet manifest.
|
||||
#
|
||||
# Drives rolling deploys via script/deploy.sh and serves as the source
|
||||
# of truth for which hosts run cortex vs neuron, and which CUDA
|
||||
# compute-capability flavour each neuron host needs.
|
||||
#
|
||||
# Flavour ↔ NVIDIA generation ↔ compute cap:
|
||||
# ampere sm_86 (RTX 30 series — e.g. 3060)
|
||||
# ada sm_89 (RTX 40 series — e.g. 4090)
|
||||
# blackwell sm_120 (RTX 50 series — e.g. 5090)
|
||||
#
|
||||
# The flavour determines which RPM is installed on a given neuron host:
|
||||
# helexa-neuron-<flavour>. Only one flavour may be installed at a time
|
||||
# (the packages Conflict: with each other).
|
||||
|
||||
cortex:
|
||||
host: hanzalova.internal
|
||||
|
||||
neurons:
|
||||
- host: beast.hanzalova.internal
|
||||
flavour: blackwell
|
||||
gpu: "2x RTX 5090"
|
||||
|
||||
- host: benjy.hanzalova.internal
|
||||
flavour: ada
|
||||
gpu: "RTX 4090"
|
||||
|
||||
- host: quadbrat.hanzalova.internal
|
||||
flavour: ampere
|
||||
gpu: "RTX 3060"
|
||||
@@ -5,9 +5,9 @@
|
||||
# invocation: `validate-neuron.sh beast.hanzalova.internal
|
||||
# Qwen/Qwen3.6-27B q5k 2`.
|
||||
#
|
||||
# Synced by script/deploy.sh from asset/neuron/<short-host>.toml. Edits
|
||||
# take effect on the next deploy.sh run (which stops + restarts the
|
||||
# service so default_models is re-read at activation).
|
||||
# Synced to /etc/neuron/neuron.toml by script/infra-setup.sh. Edits
|
||||
# take effect after the next deploy workflow run restarts the service
|
||||
# (default_models is read at activation).
|
||||
|
||||
port = 13131
|
||||
|
||||
|
||||
@@ -4,7 +4,7 @@
|
||||
# Qwen3-8B (bf16, ~18 GB), leaving ~6 GB for KV cache + activations on
|
||||
# moderate-length contexts.
|
||||
#
|
||||
# Synced by script/deploy.sh from asset/neuron/<short-host>.toml.
|
||||
# Synced to /etc/neuron/neuron.toml by script/infra-setup.sh.
|
||||
|
||||
port = 13131
|
||||
|
||||
|
||||
@@ -4,7 +4,7 @@
|
||||
# (bf16, ~4 GB), leaving ~7 GB for KV cache so long contexts on a small
|
||||
# model still have plenty of room.
|
||||
#
|
||||
# Synced by script/deploy.sh from asset/neuron/<short-host>.toml.
|
||||
# Synced to /etc/neuron/neuron.toml by script/infra-setup.sh.
|
||||
|
||||
port = 13131
|
||||
|
||||
|
||||
Reference in New Issue
Block a user