-
aa88d37509
fix(gateway): full observability + stop leaking upstream bodies
main
rob thijssen
2026-05-22 07:17:26 +03:00
-
0f00f72b47
fix(router,handlers): strip trailing slash from rewritten URL + log upstream failures
rob thijssen
2026-05-22 07:10:39 +03:00
-
9b0ed0b57f
fix(router): rewrite loopback inference URLs to use neuron's host
rob thijssen
2026-05-22 06:23:47 +03:00
-
dc2a803266
fix(rpm): migrate legacy helexa-cortex firewalld service to cortex
rob thijssen
2026-05-22 06:12:51 +03:00
-
e71181499e
feat(stage-8e-3): quantize lm_head in TP Qwen3-Next
rob thijssen
2026-05-21 21:53:14 +03:00
-
ee663e5e99
fix(stage-8e-2e): bump quant prefill threshold to M > 64
rob thijssen
2026-05-21 21:50:45 +03:00
-
34f9b77d9d
feat(stage-8e-2d): route quantized matmul by M (prefill vs decode)
rob thijssen
2026-05-21 21:15:32 +03:00
-
f084aaab8e
fix(stage-8e-2c): cast bf16/f16 activations to f32 around QMatMul
rob thijssen
2026-05-21 20:05:19 +03:00
-
68a606a79c
fix(stage-8e-2b): allow quant on the TP load path
rob thijssen
2026-05-21 19:17:14 +03:00
-
4aa71902d0
feat(stage-8e-2): plumb quant config from ModelSpec to TP load path
rob thijssen
2026-05-21 18:03:36 +03:00
-
bef159b21c
feat(stage-8e-1): MaybeQuantLinear primitive + parallel-linear quant variants
rob thijssen
2026-05-21 17:55:26 +03:00
-
8d7b099b36
feat(stage-8d-7): direct safetensors fused-region loader
rob thijssen
2026-05-21 17:49:35 +03:00
-
89d98d1fb2
diag(stage-8d-6): per-layer VRAM logging in TP load path
rob thijssen
2026-05-21 12:54:05 +03:00
-
cc95fe28d9
feat(stage-8d-5b): wire fused_gdn_gating CUDA kernel
rob thijssen
2026-05-21 11:52:38 +03:00
-
09c945f81e
feat(stage-8d-4): dispatch chunked_gated_delta_rule_recurrence at prefill
rob thijssen
2026-05-21 11:50:30 +03:00
-
05dc0bad18
feat(stage-8d-3): wire causal_conv1d_update/full CUDA kernels
rob thijssen
2026-05-21 11:49:41 +03:00
-
10c151efa5
feat(stage-8d-5): wire gated_delta_rule_recurrence kernel into tp_qwen3_5
rob thijssen
2026-05-21 11:44:12 +03:00
-
44ae927e38
feat(stage-8d-2): wire gated_delta_rule_recurrence kernel into qwen3_5
rob thijssen
2026-05-21 11:39:30 +03:00
-
1ebbe87651
feat(stage-8d-1): import mistralrs GDN CUDA kernels — build infra only
rob thijssen
2026-05-21 11:34:11 +03:00
-
70eb6af42b
feat(tp): cancellation-safe inference + structured tracing
rob thijssen
2026-05-21 08:22:00 +03:00
-
d1a4aad91d
fix(tp): always drain worker responses on leader failure
rob thijssen
2026-05-21 07:39:36 +03:00
-
95dc8745eb
feat(stage-8c): TP-aware Qwen3-Next (tp_qwen3_5)
rob thijssen
2026-05-20 22:02:42 +03:00
-
495d3f7c05
fix(qwen3_5): promote beta to F32 alongside q/k/v in delta rule
rob thijssen
2026-05-20 21:13:19 +03:00
-
5c4c8e0eba
fix(qwen3_5): tensor names are under model.language_model.*, not model.*
rob thijssen
2026-05-20 16:47:51 +03:00
-
07c44d5db1
fix(qwen3_5): nested rope_parameters + partial_rotary_factor=0.25
rob thijssen
2026-05-20 16:18:52 +03:00
-
e7eb3dab6a
feat(stage-8c): full-attention layer + decoder + Model + ForCausalLM for qwen3_5
rob thijssen
2026-05-20 15:52:33 +03:00
-
180274548d
feat(stage-8c): linear-attention layer (Qwen3-Next GatedDeltaNet)
rob thijssen
2026-05-20 09:29:52 +03:00
-
a70f317729
feat(stage-8c): scaffold qwen3_5 (Qwen3.6) — dispatch + stubs + TP gate
rob thijssen
2026-05-20 08:58:01 +03:00
-
c6022aa6b9
feat(stage-8b): Llama + Qwen3 MoE families on the candle harness
rob thijssen
2026-05-20 08:36:22 +03:00
-
9e31d8deca
feat(stage-8a): pre-flight architecture check for dense model loads
rob thijssen
2026-05-20 08:27:29 +03:00
-
b400e8b704
feat(neuron): honour HF_HUB_CACHE / HF_HOME for the candle harness cache
rob thijssen
2026-05-20 07:52:50 +03:00
-
62ca125a68
chore: keep models.example.toml generic; deploy.sh sync's local models.toml
rob thijssen
2026-05-20 07:47:08 +03:00
-
735945ee81
feat(cortex): unified /v1/models — catalogue × topology feasibility + cold-load
rob thijssen
2026-05-20 07:39:04 +03:00
-
f72dee094f
feat(tp): Stage 7c-i — streaming SSE through TP
rob thijssen
2026-05-20 07:32:46 +03:00
-
d46d8d4f6c
feat(tp): Stage 7b-iv — RPC + orchestration for TP load/inference
rob thijssen
2026-05-20 06:38:33 +03:00
-
9b8bd146f6
feat(tp): --tp-smoke CLI subcommand + remote validation script
rob thijssen
2026-05-19 19:40:25 +03:00
-
96d8755245
fix(tp): add half dep + drop double-wrapped .w() on CudaDevice::alloc
rob thijssen
2026-05-19 19:11:59 +03:00
-
12549c9aed
fix(tp): import BackendStorage trait for CudaStorage methods
rob thijssen
2026-05-19 18:32:05 +03:00
-
46527d7804
feat(tp): TP-aware Qwen3 dense model (Stage 7b-iii 2/2)
rob thijssen
2026-05-19 18:24:20 +03:00
-
8d3194f992
Stage 7b-iii (1/2): AllReduce CustomOp + ShardedVarBuilder-backed TP linears
rob thijssen
2026-05-19 18:14:54 +03:00
-
5436af9c73
fix(neuron/candle): dense Qwen3 returns rank-3 logits, double-squeeze
rob thijssen
2026-05-19 17:49:43 +03:00
-
8e882c0757
fix(neuron/tp): NcclError {e:?} + cudarc 0.19 deprecation cleanup
rob thijssen
2026-05-19 17:24:13 +03:00
-
93421f48e2
Stage 7b-ii: ColumnParallel + RowParallel sharded linear primitives
rob thijssen
2026-05-19 17:07:19 +03:00
-
05e15f3597
Stage 7b-i: dense safetensors Qwen3 load path
rob thijssen
2026-05-19 17:03:59 +03:00
-
da068ded6d
Stage 7a-ii: real NCCL handshake behind the worker pool
rob thijssen
2026-05-19 16:40:01 +03:00
-
2a7ede0232
Stage 7a-i: TP worker lifecycle scaffolding
rob thijssen
2026-05-19 15:53:00 +03:00
-
18ae3c30ee
post-validation cleanup: cuDNN runtime + repetition penalty
rob thijssen
2026-05-19 14:48:08 +03:00
-
1a0400131e
fix(deploy): use dnf upgrade for stale installs, install only when absent
rob thijssen
2026-05-19 14:10:48 +03:00
-
1866b99a89
fix(validate-neuron): jq for JSON, say→stderr, sane max_tokens
rob thijssen
2026-05-19 13:43:02 +03:00
-
60176e7c2e
ci: monotonic prerelease versions + serialize CI on shared runner
rob thijssen
2026-05-19 13:36:53 +03:00
-
602e8e1471
fix(neuron/candle): source tokenizer.json from base repo when GGUF
rob thijssen
2026-05-19 13:16:39 +03:00
-
e9d0a75dd5
ci(prerelease): auto-build on every push to main
rob thijssen
2026-05-19 13:13:36 +03:00
-
6cf87e328f
chore(neuron): log load_model failures server-side with full chain
rob thijssen
2026-05-19 13:08:54 +03:00
-
f9f5fa41b6
fix(neuron): surface full anyhow chain + ensure $HOME exists at start
rob thijssen
2026-05-19 08:17:37 +03:00
-
ed4d71db09
fix(validate-neuron): default to unsloth GGUF + capture curl errors
rob thijssen
2026-05-19 08:14:31 +03:00
-
39010c779f
add script/validate-neuron.sh — end-to-end candle harness smoke test
rob thijssen
2026-05-19 07:58:05 +03:00
-
57d7ef8d3c
chore: revert dnf. runner user has no system privs
rob thijssen
2026-05-19 07:16:38 +03:00
-
0e9671dd7d
fix(ci): drop sudo from dnf install (runner runs as root, no sudo)
rob thijssen
2026-05-19 07:06:52 +03:00
-
e29c9e35f0
fix(ci): ensure rust toolchain present on cuda-13.0 runner
rob thijssen
2026-05-19 07:04:57 +03:00
-
8a2334eacb
deploy: dnf-native version check + lair.cafe repo bootstrap
rob thijssen
2026-05-18 18:55:02 +03:00
-
aad314cdfa
feat(neuron): graceful unload-on-shutdown via SIGTERM/SIGINT
rob thijssen
2026-05-18 17:58:07 +03:00
-
6779b7526a
feat(neuron): load default_models on service activation
rob thijssen
2026-05-18 17:56:08 +03:00
-
84f5662df1
feat(neuron): OpenAI-compatible SSE streaming chat completions
rob thijssen
2026-05-18 17:53:14 +03:00
-
249c9442e8
chore: track deployment script
rob thijssen
2026-05-18 17:50:35 +03:00
-
5e17081fb4
ci(prerelease): drop redundant rustup install step
rob thijssen
2026-05-18 17:47:29 +03:00
-
03bed93fee
add asset/manifest.yml describing fleet hosts and neuron flavours
rob thijssen
2026-05-18 17:37:14 +03:00
-
4a5211d830
ci(prerelease): add ampere flavour alongside ada and blackwell
rob thijssen
2026-05-18 17:28:19 +03:00
-
6d2dc5ff1a
fix(ci): give fmt/clippy/test distinct CARGO_TARGET_DIR to avoid races
rob thijssen
2026-05-18 17:26:29 +03:00
-
b713dbe669
fix(ci): pass GPG secrets via env to avoid Gitea log leakage
rob thijssen
2026-05-18 17:13:52 +03:00
-
5c957d08ec
ci: add build-prerelease workflow for CUDA RPMs on rpm.lair.cafe
rob thijssen
2026-05-18 17:01:35 +03:00
-
729317d1ef
feat(neuron): OpenAI-compatible non-streaming chat completion
rob thijssen
2026-05-18 16:47:58 +03:00
-
5c2bd1a1da
feat(neuron): wire candle harness load/unload via GGUF
rob thijssen
2026-05-18 16:02:49 +03:00
-
3cccc2c56b
refactor(neuron): cut mistralrs/llamacpp, scaffold candle harness
rob thijssen
2026-05-18 15:53:04 +03:00
-
7f797b0265
ci: parallelise fmt/clippy/test and drop sccache install step
rob thijssen
2026-05-11 13:55:17 +03:00
-
5a0360c1d5
ci: use container runner labels for CI jobs
rob thijssen
2026-05-11 13:29:42 +03:00
-
472c0e8737
fix(rpm): ship firewalld service definitions with correct ports
rob thijssen
2026-04-23 14:05:14 +03:00
-
b9d8e30058
chore: bump version to 0.1.16
Gitea Actions
2026-04-16 15:04:21 +00:00
-
25f75fe552
chore: ignore local deploy script
v0.1.16
rob thijssen
2026-04-16 17:45:18 +03:00
-
3f94c50817
chore: move default ports out of common-collision ranges
rob thijssen
2026-04-16 17:35:09 +03:00
-
3e1fb60076
ci: drop actions/cache for cargo registry and target
rob thijssen
2026-04-16 16:47:32 +03:00
-
0184ccab28
chore: move default ports out of common-collision ranges
v0.1.15
rob thijssen
2026-04-16 17:35:09 +03:00
-
9bf987888c
chore: bump version to 0.1.14
Gitea Actions
2026-04-16 16:57:24 +03:00
-
471b9b7629
ci: drop actions/cache for cargo registry and target
rob thijssen
2026-04-16 16:47:32 +03:00
-
-
abe4ff7ccc
ci: publish both packages to a single helexa/helexa COPR project
v0.1.14
rob thijssen
2026-04-16 16:29:38 +03:00
-
7c3390a4e1
fix(rpm): rename neuron package to helexa-neuron
rob thijssen
2026-04-16 16:19:42 +03:00
-
2ff062da0e
ci: commit generated %changelog entries back to main
rob thijssen
2026-04-16 15:44:45 +03:00
-
357f858a29
chore: bump version to 0.1.12
Gitea Actions
2026-04-16 15:47:21 +03:00
-
556e5293dc
fix(rpm): explicitly Provides user(name) to satisfy systemd unit Requires
v0.1.12
rob thijssen
2026-04-16 15:30:55 +03:00
-
1d90238b01
ci: migrate rpm changelog generation to reusable action
rob thijssen
2026-04-16 15:23:45 +03:00
-
d99b25fb8a
ci: auto-generate rpm changelog entry per release
rob thijssen
2026-04-16 15:04:36 +03:00
-
034da319f1
fix(rpm): correct weekday in changelog entry
rob thijssen
2026-04-16 14:58:40 +03:00
-
e874c3483d
fix(rpm): explicitly Provides user(name) to satisfy systemd unit Requires
v0.1.11
rob thijssen
2026-04-16 15:30:55 +03:00
-
2caaae018a
ci: migrate rpm changelog generation to reusable action
rob thijssen
2026-04-16 15:23:45 +03:00
-
7ece281617
chore: bump version to 0.1.10
Gitea Actions
2026-04-16 15:06:18 +03:00
-
18d00001cf
ci: auto-generate rpm changelog entry per release
rob thijssen
2026-04-16 15:04:36 +03:00
-
ad1442c096
fix(rpm): correct weekday in changelog entry
rob thijssen
2026-04-16 14:58:40 +03:00
-
-
3bb5b3c425
fix(rpm): drop %attr(,,user) on config files to avoid dnf silent filter
v0.1.10
rob thijssen
2026-04-16 14:33:08 +03:00
-
123f692203
fix(rpm): drop %attr(,,user) on config files to avoid dnf silent filter
v0.1.9
rob thijssen
2026-04-16 14:33:08 +03:00
-
9fa51ad874
chore: bump version to 0.1.8
Gitea Actions
2026-04-16 10:56:07 +00:00
-
-
9697fbae73
fix(neuron): run service as neuron user, not cortex
v0.1.8
rob thijssen
2026-04-16 13:32:36 +03:00