Module capabilities

Module capabilities 

Source
Expand description

Model capability detection, inference, and request transformation.

This module owns two orthogonal pipelines that operate on different phases of a model request:

§1. Request-side capability pipeline

Before a chat-completion request is forwarded to llama-server the proxy consults the stored ModelCapabilities flags to decide whether to rewrite the message list. Flags are inferred at import time and stored in the database; they can also be overridden at any time via the API or CLI.

LayerFunctionWhen it fires
Template analysisinfer_from_chat_templateAt model import — reads tokenizer.chat_template from the GGUF
Architecture registrycapabilities_from_architectureAt model import — reads general.architecture as a backstop when the GGUF ships without a chat template
Request rewritingtransform_messages_for_capabilitiesAt proxy time — merges consecutive same-role messages for models that require strict turn alternation

The result of Layer 1 and Layer 2 is OR-combined and stored in Model.capabilities. The proxy reads this value once per request via a single catalog lookup.

§2. Response-side normalization pipeline

Separate from request rewriting, some models (e.g., Qwen) embed tool-call JSON inside XML tags in the response text. This is handled by the format:* tag pipeline in gglib-proxy::normalize, which is entirely independent from ModelCapabilities.

§Template analysis — positive vs. negative signals

infer_from_chat_template uses two kinds of signals for system-role detection, evaluated in priority order:

PrioritySignalExample patternConclusion
1 (positive)[SYSTEM_PROMPT] in templateMistral v7SUPPORTS_SYSTEM_ROLE set
1 (positive)[AVAILABLE_TOOLS] in templateMistral v3/v3-tekkenSUPPORTS_SYSTEM_ROLE set
2 (negative)"Only user, assistant and tool roles…"Old Mistral v1/v2SUPPORTS_SYSTEM_ROLE not set
2 (negative)"got system" / "Raise exception"Other strict modelsSUPPORTS_SYSTEM_ROLE not set
defaultNo signal foundGeneric templateSUPPORTS_SYSTEM_ROLE set

Positive evidence takes precedence: if [SYSTEM_PROMPT] or [AVAILABLE_TOOLS] appears, the negative patterns are ignored for system-role purposes. This matters because some Jinja templates contain both an error-raise branch for unknown roles AND a valid system branch guarded by [SYSTEM_PROMPT].

§Architecture registry

capabilities_from_architecture maps GGUF general.architecture strings to ModelCapabilities flags. This is the backstop for models whose quantized builds strip the tokenizer.chat_template section, making infer_from_chat_template return empty().

Architecture stringModelsFlags
"mistral"Mistral v1/v2 (old)REQUIRES_STRICT_TURNS
"mistral3"Devstral, Ministral, Mistral Small 3REQUIRES_STRICT_TURNS | SUPPORTS_SYSTEM_ROLE

To add a new architecture:

  1. Add a match arm in capabilities_from_architecture mapping the architecture string to the appropriate flags.
  2. Add a unit test in the #[cfg(test)] block at the bottom of this file.
  3. If the architecture also needs response-side normalization (XML tool calls, custom reasoning tags, etc.), follow the steps in CONTRIBUTING.md under “Adding a new model architecture” to add a format:* parser as well.
  4. No other files need touching — all call sites already use these functions.

Note on Qwen: Qwen is intentionally absent from the registry. Qwen’s quantized builds always ship a full chat template, so infer_from_chat_template handles the request side. Its response-side <tool_call> XML is handled by the format:qwen-xml tag pipeline.

Structs§

ChatMessage
A chat message for transformation.
ModelCapabilities
Model capabilities inferred from chat template analysis.

Enums§

MessageContent
The content of a chat message.

Functions§

capabilities_from_architecture
Map a GGUF general.architecture value to its inherent ModelCapabilities.
infer_from_chat_template
Infer model capabilities from chat template Jinja source and model name.
merge_consecutive_system_messages 🔒
Merge consecutive system messages into a single message.
transform_messages_for_capabilities
Transform chat messages based on model capabilities.