fix(neuron): surface full anyhow chain + ensure $HOME exists at start
Some checks failed
CI / Format (push) Successful in 30s
CI / Test (push) Failing after 49s
CI / Clippy (push) Successful in 2m16s
CI / Build cortex SRPM (push) Has been skipped
CI / Build neuron SRPM (push) Has been skipped
CI / Publish cortex to COPR (push) Has been skipped
CI / Publish neuron to COPR (push) Has been skipped
CI / Bump version in source (push) Has been skipped
Some checks failed
CI / Format (push) Successful in 30s
CI / Test (push) Failing after 49s
CI / Clippy (push) Successful in 2m16s
CI / Build cortex SRPM (push) Has been skipped
CI / Build neuron SRPM (push) Has been skipped
CI / Publish cortex to COPR (push) Has been skipped
CI / Publish neuron to COPR (push) Has been skipped
CI / Bump version in source (push) Has been skipped
Two fixes uncovered by the live validation against beast/benjy/quadbrat:
1. api.rs swallowed everything beyond the outermost anyhow context.
The validation script reported '{"error":"fetch GGUF ...gguf"}' but
the actual underlying hf-hub failure (cache dir creation, network,
auth, etc.) was hidden. Switching every error response to
format!("{e:#}") expands the full cause chain via anyhow's
alternate Display format.
2. The neuron systemd unit declared the service user but never ensured
/var/lib/neuron (its $HOME) existed. hf-hub defaults its cache to
~/.cache/huggingface/hub — when $HOME is absent the cache dir
creation fails and the download aborts. Adding `StateDirectory=neuron`
makes systemd create + chown that directory at activation; no spec
change needed.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -56,7 +56,7 @@ async fn list_models(State(state): State<Arc<NeuronState>>) -> impl IntoResponse
|
|||||||
Ok(models) => Json(json!(models)).into_response(),
|
Ok(models) => Json(json!(models)).into_response(),
|
||||||
Err(e) => (
|
Err(e) => (
|
||||||
StatusCode::INTERNAL_SERVER_ERROR,
|
StatusCode::INTERNAL_SERVER_ERROR,
|
||||||
Json(json!({"error": e.to_string()})),
|
Json(json!({"error": format!("{e:#}")})),
|
||||||
)
|
)
|
||||||
.into_response(),
|
.into_response(),
|
||||||
}
|
}
|
||||||
@@ -71,7 +71,7 @@ async fn load_model(
|
|||||||
Ok(()) => Json(json!({"status": "loaded"})).into_response(),
|
Ok(()) => Json(json!({"status": "loaded"})).into_response(),
|
||||||
Err(e) => (
|
Err(e) => (
|
||||||
StatusCode::BAD_REQUEST,
|
StatusCode::BAD_REQUEST,
|
||||||
Json(json!({"error": e.to_string()})),
|
Json(json!({"error": format!("{e:#}")})),
|
||||||
)
|
)
|
||||||
.into_response(),
|
.into_response(),
|
||||||
}
|
}
|
||||||
@@ -95,7 +95,11 @@ async fn unload_model(
|
|||||||
let registry = state.registry.read().await;
|
let registry = state.registry.read().await;
|
||||||
match registry.unload_model(&model_id).await {
|
match registry.unload_model(&model_id).await {
|
||||||
Ok(()) => Json(json!({"status": "unloaded"})).into_response(),
|
Ok(()) => Json(json!({"status": "unloaded"})).into_response(),
|
||||||
Err(e) => (StatusCode::NOT_FOUND, Json(json!({"error": e.to_string()}))).into_response(),
|
Err(e) => (
|
||||||
|
StatusCode::NOT_FOUND,
|
||||||
|
Json(json!({"error": format!("{e:#}")})),
|
||||||
|
)
|
||||||
|
.into_response(),
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
@@ -151,7 +155,7 @@ async fn chat_completions(
|
|||||||
.into_response(),
|
.into_response(),
|
||||||
Err(InferenceError::Other(e)) => (
|
Err(InferenceError::Other(e)) => (
|
||||||
StatusCode::INTERNAL_SERVER_ERROR,
|
StatusCode::INTERNAL_SERVER_ERROR,
|
||||||
Json(json!({"error": e.to_string()})),
|
Json(json!({"error": format!("{e:#}")})),
|
||||||
)
|
)
|
||||||
.into_response(),
|
.into_response(),
|
||||||
}
|
}
|
||||||
@@ -165,7 +169,7 @@ async fn chat_completions(
|
|||||||
.into_response(),
|
.into_response(),
|
||||||
Err(InferenceError::Other(e)) => (
|
Err(InferenceError::Other(e)) => (
|
||||||
StatusCode::INTERNAL_SERVER_ERROR,
|
StatusCode::INTERNAL_SERVER_ERROR,
|
||||||
Json(json!({"error": e.to_string()})),
|
Json(json!({"error": format!("{e:#}")})),
|
||||||
)
|
)
|
||||||
.into_response(),
|
.into_response(),
|
||||||
}
|
}
|
||||||
|
|||||||
@@ -10,6 +10,12 @@ Restart=on-failure
|
|||||||
RestartSec=5
|
RestartSec=5
|
||||||
User=neuron
|
User=neuron
|
||||||
Group=neuron
|
Group=neuron
|
||||||
|
# /var/lib/neuron is the neuron user's $HOME — hf-hub writes its
|
||||||
|
# default cache there (~/.cache/huggingface/hub). Without this directive
|
||||||
|
# systemd doesn't create the directory and hf-hub downloads fail with
|
||||||
|
# "fetch GGUF <file>: failed to create cache dir".
|
||||||
|
StateDirectory=neuron
|
||||||
|
StateDirectoryMode=0755
|
||||||
# Loading default_models from neuron.toml happens before the HTTP
|
# Loading default_models from neuron.toml happens before the HTTP
|
||||||
# listener binds; large models can take many minutes to download and
|
# listener binds; large models can take many minutes to download and
|
||||||
# materialise on first activation. systemd's default TimeoutStartSec
|
# materialise on first activation. systemd's default TimeoutStartSec
|
||||||
|
|||||||
Reference in New Issue
Block a user