Module capabilities

Expand description

Model capability detection, inference, and request transformation.

This module owns two orthogonal pipelines that operate on different phases of a model request:

§1. Request-side capability pipeline

Before a chat-completion request is forwarded to llama-server the proxy consults the stored ModelCapabilities flags to decide whether to rewrite the message list. Flags are inferred at import time and stored in the database; they can also be overridden at any time via the API or CLI.

Layer	Function	When it fires
Template analysis	`infer_from_chat_template`	At model import — reads `tokenizer.chat_template` from the GGUF
Architecture registry	`capabilities_from_architecture`	At model import — reads `general.architecture` as a backstop when the GGUF ships without a chat template
Request rewriting	`transform_messages_for_capabilities`	At proxy time — merges consecutive same-role messages for models that require strict turn alternation

The result of Layer 1 and Layer 2 is OR-combined and stored in Model.capabilities. The proxy reads this value once per request via a single catalog lookup.

§2. Response-side normalization pipeline

Separate from request rewriting, some models (e.g., Qwen) embed tool-call JSON inside XML tags in the response text. This is handled by the format:* tag pipeline in gglib-proxy::normalize, which is entirely independent from ModelCapabilities.

§Template analysis — positive vs. negative signals

infer_from_chat_template uses two kinds of signals for system-role detection, evaluated in priority order:

Priority	Signal	Example pattern	Conclusion
1 (positive)	`[SYSTEM_PROMPT]` in template	Mistral v7	`SUPPORTS_SYSTEM_ROLE` set
1 (positive)	`[AVAILABLE_TOOLS]` in template	Mistral v3/v3-tekken	`SUPPORTS_SYSTEM_ROLE` set
2 (negative)	`"Only user, assistant and tool roles…"`	Old Mistral v1/v2	`SUPPORTS_SYSTEM_ROLE` not set
2 (negative)	`"got system"` / `"Raise exception"`	Other strict models	`SUPPORTS_SYSTEM_ROLE` not set
default	No signal found	Generic template	`SUPPORTS_SYSTEM_ROLE` set

Positive evidence takes precedence: if [SYSTEM_PROMPT] or [AVAILABLE_TOOLS] appears, the negative patterns are ignored for system-role purposes. This matters because some Jinja templates contain both an error-raise branch for unknown roles AND a valid system branch guarded by [SYSTEM_PROMPT].

§Architecture registry

capabilities_from_architecture maps GGUF general.architecture strings to ModelCapabilities flags. This is the backstop for models whose quantized builds strip the tokenizer.chat_template section, making infer_from_chat_template return empty().

Architecture string	Models	Flags
`"mistral"`	Mistral v1/v2 (old)	`REQUIRES_STRICT_TURNS`
`"mistral3"`	Devstral, Ministral, Mistral Small 3	`REQUIRES_STRICT_TURNS \| SUPPORTS_SYSTEM_ROLE`

To add a new architecture:

Add a match arm in capabilities_from_architecture mapping the architecture string to the appropriate flags.
Add a unit test in the #[cfg(test)] block at the bottom of this file.
If the architecture also needs response-side normalization (XML tool calls, custom reasoning tags, etc.), follow the steps in CONTRIBUTING.md under “Adding a new model architecture” to add a format:* parser as well.
No other files need touching — all call sites already use these functions.

Note on Qwen: Qwen is intentionally absent from the registry. Qwen’s quantized builds always ship a full chat template, so infer_from_chat_template handles the request side. Its response-side <tool_call> XML is handled by the format:qwen-xml tag pipeline.

Structs§

ChatMessage: A chat message for transformation.
ModelCapabilities: Model capabilities inferred from chat template analysis.

Enums§

MessageContent: The content of a chat message.

Functions§

capabilities_from_architecture: Map a GGUF general.architecture value to its inherent ModelCapabilities.
infer_from_chat_template: Infer model capabilities from chat template Jinja source and model name.
merge_consecutive_system_messages 🔒: Merge consecutive system messages into a single message.
transform_messages_for_capabilities: Transform chat messages based on model capabilities.