chore(deploy): drop deploy.sh and manifest.yml now that workflow runs

First end-to-end run of the deploy workflow succeeded (gitea run #289), so the operator-run rolling-deploy script and its YAML manifest are no longer the source of truth — fleet topology lives in .gitea/workflows/deploy.yml and per-host config in script/infra-setup.sh. Per-host neuron config comments updated to point at the new sync path. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-06-02 16:41:04 +03:00
parent 577781de8d
commit ea1fdf8aa6
5 changed files with 5 additions and 338 deletions
--- a/asset/manifest.yml
+++ b/asset/manifest.yml
@@ -1,30 +0,0 @@
-# Helexa fleet manifest.
-#
-# Drives rolling deploys via script/deploy.sh and serves as the source
-# of truth for which hosts run cortex vs neuron, and which CUDA
-# compute-capability flavour each neuron host needs.
-#
-# Flavour ↔ NVIDIA generation ↔ compute cap:
-#   ampere    sm_86   (RTX 30 series — e.g. 3060)
-#   ada       sm_89   (RTX 40 series — e.g. 4090)
-#   blackwell sm_120  (RTX 50 series — e.g. 5090)
-#
-# The flavour determines which RPM is installed on a given neuron host:
-# helexa-neuron-<flavour>. Only one flavour may be installed at a time
-# (the packages Conflict: with each other).
-
-cortex:
-  host: hanzalova.internal
-
-neurons:
-  - host: beast.hanzalova.internal
-    flavour: blackwell
-    gpu: "2x RTX 5090"
-
-  - host: benjy.hanzalova.internal
-    flavour: ada
-    gpu: "RTX 4090"
-
-  - host: quadbrat.hanzalova.internal
-    flavour: ampere
-    gpu: "RTX 3060"
--- a/asset/neuron/beast.toml
+++ b/asset/neuron/beast.toml
@@ -5,9 +5,9 @@
 # invocation: `validate-neuron.sh beast.hanzalova.internal
 # Qwen/Qwen3.6-27B q5k 2`.
 #
-# Synced by script/deploy.sh from asset/neuron/<short-host>.toml. Edits
-# take effect on the next deploy.sh run (which stops + restarts the
-# service so default_models is re-read at activation).
+# Synced to /etc/neuron/neuron.toml by script/infra-setup.sh. Edits
+# take effect after the next deploy workflow run restarts the service
+# (default_models is read at activation).

 port = 13131

--- a/asset/neuron/benjy.toml
+++ b/asset/neuron/benjy.toml
@@ -4,7 +4,7 @@
 # Qwen3-8B (bf16, ~18 GB), leaving ~6 GB for KV cache + activations on
 # moderate-length contexts.
 #
-# Synced by script/deploy.sh from asset/neuron/<short-host>.toml.
+# Synced to /etc/neuron/neuron.toml by script/infra-setup.sh.

 port = 13131

--- a/asset/neuron/quadbrat.toml
+++ b/asset/neuron/quadbrat.toml
@@ -4,7 +4,7 @@
 # (bf16, ~4 GB), leaving ~7 GB for KV cache so long contexts on a small
 # model still have plenty of room.
 #
-# Synced by script/deploy.sh from asset/neuron/<short-host>.toml.
+# Synced to /etc/neuron/neuron.toml by script/infra-setup.sh.

 port = 13131