feat(helexa-acp): image input for vision-capable models
All checks were successful
build-prerelease / Resolve version stamps (push) Successful in 34s
CI / Format (push) Successful in 37s
CI / Clippy (push) Successful in 2m33s
CI / Test (push) Successful in 5m4s
CI / Build cortex SRPM (push) Has been skipped
CI / Build neuron SRPM (push) Has been skipped
CI / Publish cortex to COPR (push) Has been skipped
CI / Publish neuron to COPR (push) Has been skipped
CI / Bump version in source (push) Has been skipped
build-prerelease / Build neuron-blackwell (push) Successful in 6m2s
build-prerelease / Build neuron-ampere (push) Successful in 7m49s
build-prerelease / Build neuron-ada (push) Successful in 5m27s
build-prerelease / Build cortex binary (push) Successful in 4m16s
build-prerelease / Package cortex RPM (push) Successful in 1m19s
build-prerelease / Package helexa-neuron-ampere RPM (push) Successful in 3m2s
build-prerelease / Package helexa-neuron-ada RPM (push) Successful in 3m10s
build-prerelease / Package helexa-neuron-blackwell RPM (push) Successful in 3m47s
build-prerelease / Publish to rpm.lair.cafe (unstable) (push) Successful in 1m2s
All checks were successful
build-prerelease / Resolve version stamps (push) Successful in 34s
CI / Format (push) Successful in 37s
CI / Clippy (push) Successful in 2m33s
CI / Test (push) Successful in 5m4s
CI / Build cortex SRPM (push) Has been skipped
CI / Build neuron SRPM (push) Has been skipped
CI / Publish cortex to COPR (push) Has been skipped
CI / Publish neuron to COPR (push) Has been skipped
CI / Bump version in source (push) Has been skipped
build-prerelease / Build neuron-blackwell (push) Successful in 6m2s
build-prerelease / Build neuron-ampere (push) Successful in 7m49s
build-prerelease / Build neuron-ada (push) Successful in 5m27s
build-prerelease / Build cortex binary (push) Successful in 4m16s
build-prerelease / Package cortex RPM (push) Successful in 1m19s
build-prerelease / Package helexa-neuron-ampere RPM (push) Successful in 3m2s
build-prerelease / Package helexa-neuron-ada RPM (push) Successful in 3m10s
build-prerelease / Package helexa-neuron-blackwell RPM (push) Successful in 3m47s
build-prerelease / Publish to rpm.lair.cafe (unstable) (push) Successful in 1m2s
Stage 5. Zed clipboard/DnD images get forwarded as OpenAI
content-array messages on user turns.
- New MessageContent::MultiPart variant + MessagePart (Text|Image)
+ ImageData struct (mime_type, base64 data, optional uri).
- flatten_prompt now produces structured content: collapses to
Text when every block is text (some upstreams treat array-form
as vision-only and refuse on text-only models), otherwise
produces MultiPart preserving block order.
- OpenAI encoder emits `[{type:"text",text:…}, {type:"image_url",
image_url:{url:"data:{mime};base64,{data}"}}]` for MultiPart user
messages. Data URIs are used over remote `uri` because they
round-trip through every upstream we care about.
- prompt_capabilities.image = true at initialize so Zed actually
sends image blocks.
- compaction estimates ~512 tokens per image (the middle of the
Qwen3-VL / OpenAI detail range) so the budget tracker doesn't
pretend images are free.
- session/load replays image-bearing user turns by surfacing the
text parts verbatim and rendering each image as a "[image: {mime}
({n} bytes)]" placeholder chunk — Zed can show the prior text
context even though re-uploading the bytes through ACP isn't
meaningful for resume.
- 4 new tests: flatten produces MultiPart in block order, image-only
prompts still flatten to MultiPart, encoder emits the correct
array shape, text-only encoding stays as the string form.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This commit is contained in:
@@ -32,7 +32,7 @@
|
||||
//! (over-estimates tokens slightly) so we compact a touch early
|
||||
//! rather than a touch late.
|
||||
|
||||
use crate::provider::{Message, MessageContent, Role};
|
||||
use crate::provider::{Message, MessageContent, MessagePart, Role};
|
||||
|
||||
/// Most-recent N messages that are never elided. Roughly "the
|
||||
/// current tool round in flight" — assistant turn that called the
|
||||
@@ -54,6 +54,13 @@ const CHARS_PER_TOKEN: f32 = 3.5;
|
||||
/// to a few tokens; tiny but it adds up across long histories.
|
||||
const ENVELOPE_TOKENS: usize = 8;
|
||||
|
||||
/// Rough per-image token cost used by the budget estimator. Real
|
||||
/// vision tokenizers vary widely (256–1024 tokens for typical
|
||||
/// resolutions on Qwen3-VL, OpenAI's `low`/`high` detail toggles
|
||||
/// pick between ~85 and ~1000+). 512 is a defensible middle that
|
||||
/// keeps compaction from treating images as free.
|
||||
const IMAGE_TOKENS_APPROX: usize = 512;
|
||||
|
||||
/// Stats reported back from [`compact_to_budget`] for the caller to
|
||||
/// log. The numbers are estimates (see [`estimate_tokens`]), so
|
||||
/// don't compare them to upstream-reported token counts as if they
|
||||
@@ -87,6 +94,19 @@ impl CompactionStats {
|
||||
pub fn estimate_tokens(msg: &Message) -> usize {
|
||||
let chars = match &msg.content {
|
||||
MessageContent::Text { text } => text.len(),
|
||||
MessageContent::MultiPart { parts } => parts
|
||||
.iter()
|
||||
.map(|p| match p {
|
||||
MessagePart::Text { text } => text.len(),
|
||||
// Each image is one block in the context window; the
|
||||
// upstream tokenizer handles the real cost (and it
|
||||
// varies wildly by model — Qwen3-VL uses ~256-1024
|
||||
// tokens per image depending on size). Take a
|
||||
// middle estimate so the budget tracker doesn't
|
||||
// pretend images are free.
|
||||
MessagePart::Image(_) => IMAGE_TOKENS_APPROX * CHARS_PER_TOKEN as usize,
|
||||
})
|
||||
.sum(),
|
||||
MessageContent::ToolCalls { text, calls } => {
|
||||
let txt = text.as_deref().map(|s| s.len()).unwrap_or(0);
|
||||
let calls_size: usize = calls
|
||||
@@ -206,6 +226,15 @@ fn elide_in_place(msg: &mut Message) -> bool {
|
||||
*text = format!("(elided: {} bytes of assistant prose)", text.len());
|
||||
true
|
||||
}
|
||||
MessageContent::MultiPart { .. } => {
|
||||
// MultiPart messages today only exist as User turns,
|
||||
// and User turns are protected by the role check in
|
||||
// `compact_to_budget` — so this branch is unreachable
|
||||
// for current call sites. Returning false keeps the
|
||||
// unreachable path benign if a future stage starts
|
||||
// emitting MultiPart on other roles.
|
||||
false
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
|
||||
Reference in New Issue
Block a user