feat(helexa-acp): image input for vision-capable models
All checks were successful
build-prerelease / Resolve version stamps (push) Successful in 34s
CI / Format (push) Successful in 37s
CI / Clippy (push) Successful in 2m33s
CI / Test (push) Successful in 5m4s
CI / Build cortex SRPM (push) Has been skipped
CI / Build neuron SRPM (push) Has been skipped
CI / Publish cortex to COPR (push) Has been skipped
CI / Publish neuron to COPR (push) Has been skipped
CI / Bump version in source (push) Has been skipped
build-prerelease / Build neuron-blackwell (push) Successful in 6m2s
build-prerelease / Build neuron-ampere (push) Successful in 7m49s
build-prerelease / Build neuron-ada (push) Successful in 5m27s
build-prerelease / Build cortex binary (push) Successful in 4m16s
build-prerelease / Package cortex RPM (push) Successful in 1m19s
build-prerelease / Package helexa-neuron-ampere RPM (push) Successful in 3m2s
build-prerelease / Package helexa-neuron-ada RPM (push) Successful in 3m10s
build-prerelease / Package helexa-neuron-blackwell RPM (push) Successful in 3m47s
build-prerelease / Publish to rpm.lair.cafe (unstable) (push) Successful in 1m2s
All checks were successful
build-prerelease / Resolve version stamps (push) Successful in 34s
CI / Format (push) Successful in 37s
CI / Clippy (push) Successful in 2m33s
CI / Test (push) Successful in 5m4s
CI / Build cortex SRPM (push) Has been skipped
CI / Build neuron SRPM (push) Has been skipped
CI / Publish cortex to COPR (push) Has been skipped
CI / Publish neuron to COPR (push) Has been skipped
CI / Bump version in source (push) Has been skipped
build-prerelease / Build neuron-blackwell (push) Successful in 6m2s
build-prerelease / Build neuron-ampere (push) Successful in 7m49s
build-prerelease / Build neuron-ada (push) Successful in 5m27s
build-prerelease / Build cortex binary (push) Successful in 4m16s
build-prerelease / Package cortex RPM (push) Successful in 1m19s
build-prerelease / Package helexa-neuron-ampere RPM (push) Successful in 3m2s
build-prerelease / Package helexa-neuron-ada RPM (push) Successful in 3m10s
build-prerelease / Package helexa-neuron-blackwell RPM (push) Successful in 3m47s
build-prerelease / Publish to rpm.lair.cafe (unstable) (push) Successful in 1m2s
Stage 5. Zed clipboard/DnD images get forwarded as OpenAI
content-array messages on user turns.
- New MessageContent::MultiPart variant + MessagePart (Text|Image)
+ ImageData struct (mime_type, base64 data, optional uri).
- flatten_prompt now produces structured content: collapses to
Text when every block is text (some upstreams treat array-form
as vision-only and refuse on text-only models), otherwise
produces MultiPart preserving block order.
- OpenAI encoder emits `[{type:"text",text:…}, {type:"image_url",
image_url:{url:"data:{mime};base64,{data}"}}]` for MultiPart user
messages. Data URIs are used over remote `uri` because they
round-trip through every upstream we care about.
- prompt_capabilities.image = true at initialize so Zed actually
sends image blocks.
- compaction estimates ~512 tokens per image (the middle of the
Qwen3-VL / OpenAI detail range) so the budget tracker doesn't
pretend images are free.
- session/load replays image-bearing user turns by surfacing the
text parts verbatim and rendering each image as a "[image: {mime}
({n} bytes)]" placeholder chunk — Zed can show the prior text
context even though re-uploading the bytes through ACP isn't
meaningful for resume.
- 4 new tests: flatten produces MultiPart in block order, image-only
prompts still flatten to MultiPart, encoder emits the correct
array shape, text-only encoding stays as the string form.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This commit is contained in:
@@ -101,6 +101,11 @@ pub enum MessageContent {
|
||||
/// enum, which is incompatible with newtype-of-primitive
|
||||
/// variants.
|
||||
Text { text: String },
|
||||
/// Mixed text + image user turn. Stage 5 introduces this when
|
||||
/// Zed sends an `ImageContent` block alongside the user's prompt.
|
||||
/// Providers that don't support vision should down-convert by
|
||||
/// dropping image parts and concatenating text parts.
|
||||
MultiPart { parts: Vec<MessagePart> },
|
||||
/// Assistant turn that called one or more tools. Stage 3 starts
|
||||
/// constructing this when the provider stream yields a
|
||||
/// `ToolCallStart` / `ToolCallArgsDelta` sequence.
|
||||
@@ -117,6 +122,31 @@ pub enum MessageContent {
|
||||
},
|
||||
}
|
||||
|
||||
/// One part of a [`MessageContent::MultiPart`] message.
|
||||
#[derive(Debug, Clone, Serialize, Deserialize)]
|
||||
#[serde(tag = "type", rename_all = "snake_case")]
|
||||
pub enum MessagePart {
|
||||
Text { text: String },
|
||||
Image(ImageData),
|
||||
}
|
||||
|
||||
/// Inline image attachment. `data` is base64-encoded raw image
|
||||
/// bytes; the encoder constructs an `image_url` data URI from it
|
||||
/// at request time. `uri` carries any pointer the client supplied
|
||||
/// (e.g. `file:///tmp/x.png`) — we keep it on the message for
|
||||
/// debugging / future providers but the OpenAI encoder ignores it
|
||||
/// when `data` is present (data wins, since it round-trips through
|
||||
/// every wire format).
|
||||
#[derive(Debug, Clone, Serialize, Deserialize)]
|
||||
pub struct ImageData {
|
||||
pub mime_type: String,
|
||||
/// Base64-encoded image bytes (no `data:` prefix, no padding
|
||||
/// stripped — exactly what `ImageContent.data` carried).
|
||||
pub data: String,
|
||||
#[serde(default, skip_serializing_if = "Option::is_none")]
|
||||
pub uri: Option<String>,
|
||||
}
|
||||
|
||||
#[derive(Debug, Clone, Serialize, Deserialize)]
|
||||
pub struct ToolCall {
|
||||
/// Provider-assigned id that ties the call to its result. The
|
||||
|
||||
Reference in New Issue
Block a user