Expand description
Model capability detection, inference, and request transformation.
This module owns two orthogonal pipelines that operate on different phases of a model request:
§1. Request-side capability pipeline
Before a chat-completion request is forwarded to llama-server the proxy
consults the stored ModelCapabilities flags to decide whether to rewrite
the message list. Flags are inferred at import time and stored in the
database; they can also be overridden at any time via the API or CLI.
| Layer | Function | When it fires |
|---|---|---|
| Template analysis | infer_from_chat_template | At model import — reads tokenizer.chat_template from the GGUF |
| Architecture registry | capabilities_from_architecture | At model import — reads general.architecture as a backstop when the GGUF ships without a chat template |
| Request rewriting | transform_messages_for_capabilities | At proxy time — merges consecutive same-role messages for models that require strict turn alternation |
The result of Layer 1 and Layer 2 is OR-combined and stored in
Model.capabilities. The proxy reads this value once per request via a
single catalog lookup.
§2. Response-side normalization pipeline
Separate from request rewriting, some models (e.g., Qwen) embed tool-call
JSON inside XML tags in the response text. This is handled by the
format:* tag pipeline in gglib-proxy::normalize, which is entirely
independent from ModelCapabilities.
§Template analysis — positive vs. negative signals
infer_from_chat_template uses two kinds of signals for system-role
detection, evaluated in priority order:
| Priority | Signal | Example pattern | Conclusion |
|---|---|---|---|
| 1 (positive) | [SYSTEM_PROMPT] in template | Mistral v7 | SUPPORTS_SYSTEM_ROLE set |
| 1 (positive) | [AVAILABLE_TOOLS] in template | Mistral v3/v3-tekken | SUPPORTS_SYSTEM_ROLE set |
| 2 (negative) | "Only user, assistant and tool roles…" | Old Mistral v1/v2 | SUPPORTS_SYSTEM_ROLE not set |
| 2 (negative) | "got system" / "Raise exception" | Other strict models | SUPPORTS_SYSTEM_ROLE not set |
| default | No signal found | Generic template | SUPPORTS_SYSTEM_ROLE set |
Positive evidence takes precedence: if [SYSTEM_PROMPT] or [AVAILABLE_TOOLS]
appears, the negative patterns are ignored for system-role purposes. This
matters because some Jinja templates contain both an error-raise branch for
unknown roles AND a valid system branch guarded by [SYSTEM_PROMPT].
§Architecture registry
capabilities_from_architecture maps GGUF general.architecture strings
to ModelCapabilities flags. This is the backstop for models whose
quantized builds strip the tokenizer.chat_template section, making
infer_from_chat_template return empty().
| Architecture string | Models | Flags |
|---|---|---|
"mistral" | Mistral v1/v2 (old) | REQUIRES_STRICT_TURNS |
"mistral3" | Devstral, Ministral, Mistral Small 3 | REQUIRES_STRICT_TURNS | SUPPORTS_SYSTEM_ROLE |
To add a new architecture:
- Add a match arm in
capabilities_from_architecturemapping the architecture string to the appropriate flags. - Add a unit test in the
#[cfg(test)]block at the bottom of this file. - If the architecture also needs response-side normalization (XML tool
calls, custom reasoning tags, etc.), follow the steps in
CONTRIBUTING.mdunder “Adding a new model architecture” to add aformat:*parser as well. - No other files need touching — all call sites already use these functions.
Note on Qwen: Qwen is intentionally absent from the registry. Qwen’s
quantized builds always ship a full chat template, so
infer_from_chat_template handles the request side. Its response-side
<tool_call> XML is handled by the format:qwen-xml tag pipeline.
Structs§
- Chat
Message - A chat message for transformation.
- Model
Capabilities - Model capabilities inferred from chat template analysis.
Enums§
- Message
Content - The content of a chat message.
Functions§
- capabilities_
from_ architecture - Map a GGUF
general.architecturevalue to its inherentModelCapabilities. - infer_
from_ chat_ template - Infer model capabilities from chat template Jinja source and model name.
- merge_
consecutive_ 🔒system_ messages - Merge consecutive system messages into a single message.
- transform_
messages_ for_ capabilities - Transform chat messages based on model capabilities.