pub struct InferenceConfig {
pub temperature: Option<f32>,
pub top_p: Option<f32>,
pub top_k: Option<i32>,
pub max_tokens: Option<u32>,
pub repeat_penalty: Option<f32>,
pub presence_penalty: Option<f32>,
pub min_p: Option<f32>,
}Expand description
Inference parameters for LLM sampling.
All fields are optional to support partial configuration and fallback chains. Intended to be shared across model defaults, global settings, and request overrides.
§Hierarchy Resolution
When making an inference request, parameters are resolved in this order:
- Request-level override (user specified for this request)
- Per-model defaults (stored in
Model.inference_defaults) - Global settings (stored in
Settings.inference_defaults) - Hardcoded fallback (e.g., temperature = 0.7)
§Examples
use gglib_core::domain::InferenceConfig;
// Conservative settings for code generation
let code_gen = InferenceConfig {
temperature: Some(0.2),
top_p: Some(0.9),
top_k: Some(40),
max_tokens: Some(2048),
repeat_penalty: Some(1.1),
presence_penalty: None,
min_p: None,
};
// Creative writing settings
let creative = InferenceConfig {
temperature: Some(1.2),
top_p: Some(0.95),
..Default::default()
};Fields§
§temperature: Option<f32>Sampling temperature (0.0 - 2.0).
Controls randomness in token selection:
- Lower values (0.1-0.5): More deterministic, focused
- Medium values (0.7-1.0): Balanced creativity
- Higher values (1.1-2.0): More random, creative
top_p: Option<f32>Nucleus sampling threshold (0.0 - 1.0).
Considers only the top tokens whose cumulative probability exceeds this threshold. Common values: 0.9 (default), 0.95 (more diverse)
top_k: Option<i32>Top-K sampling limit.
Considers only the K most likely next tokens. Common values: 40 (default), 10 (focused), 100 (diverse)
max_tokens: Option<u32>Maximum tokens to generate in response.
Hard limit on response length. Does not include input tokens.
repeat_penalty: Option<f32>Repetition penalty (> 0.0, typically 1.0 - 1.3).
Penalizes repeated tokens to reduce repetitive output.
- 1.0: No penalty (default)
- 1.1-1.3: Moderate penalty
-
1.3: Strong penalty (may hurt coherence)
presence_penalty: Option<f32>Presence penalty (0.0 - 2.0).
Penalizes tokens that have already appeared in the output, encouraging the model to cover new ground. Effective at preventing repetitive reasoning loops in thinking models.
- 0.0: No penalty (default; disabled)
- 1.5: Recommended for reasoning/thinking models (e.g.
Qwen3.6,DeepSeek-R1) -
2.0: Avoid; may degrade coherence
min_p: Option<f32>Minimum-probability sampling threshold (0.0 - 1.0).
Removes tokens whose probability is below min_p × P(top token).
- 0.0: Disabled (explicit off; recommended by Qwen3.6)
- 0.05: llama.cpp built-in default when the flag is omitted
Implementations§
Source§impl InferenceConfig
impl InferenceConfig
Sourcepub const fn merge_with(&mut self, other: &Self)
pub const fn merge_with(&mut self, other: &Self)
Merge another config into this one, preferring values from other.
For each field, if other has Some(value), use it; otherwise keep self’s value.
This is useful for applying fallback chains.
§Example
use gglib_core::domain::InferenceConfig;
let mut request = InferenceConfig {
temperature: Some(0.8),
..Default::default()
};
let model_defaults = InferenceConfig {
temperature: Some(0.5),
top_p: Some(0.9),
..Default::default()
};
request.merge_with(&model_defaults);
assert_eq!(request.temperature, Some(0.8)); // Request value wins
assert_eq!(request.top_p, Some(0.9)); // Fallback to model defaultSourcepub const fn with_hardcoded_defaults() -> Self
pub const fn with_hardcoded_defaults() -> Self
Create a new config with all fields set to sensible defaults.
These are the hardcoded fallback values used when no other defaults are configured.
Sourcepub fn to_cli_args(&self) -> Vec<String>
pub fn to_cli_args(&self) -> Vec<String>
Convert inference config to llama CLI arguments.
Returns a vector of argument strings suitable for passing to llama-server.
Uses the same flag names as llama.cpp: --temp, --top-p, --top-k, -n, --repeat-penalty.
This is the single source of truth for CLI flag conversion, used by:
LlamaCommandBuilder(for CLI commands)- GUI server startup (via
ServerConfig.extra_args)
§Example
use gglib_core::domain::InferenceConfig;
let config = InferenceConfig {
temperature: Some(0.8),
top_p: Some(0.9),
top_k: None,
max_tokens: Some(1024),
repeat_penalty: None,
presence_penalty: None,
min_p: None,
};
let args = config.to_cli_args();
assert_eq!(args, vec!["--temp", "0.8", "--top-p", "0.9", "-n", "1024"]);Sourcepub const fn reasoning_profile() -> Self
pub const fn reasoning_profile() -> Self
Return a recommended InferenceConfig profile for reasoning / thinking models.
Applied automatically at import time when the "reasoning" capability tag is
detected (e.g. Qwen3.6, DeepSeek-R1, QwQ). Values follow the Qwen3.6 upstream
guidance for thinking mode — general tasks and are conservative enough to
work well across all thinking-capable models.
| Parameter | Value | Rationale |
|---|---|---|
temperature | 1.0 | Recommended thinking-mode baseline |
top_p | 0.95 | Broad nucleus; standard for reasoning |
top_k | 20 | Tighter than the 40 fallback; suppresses low-quality tokens |
max_tokens | 8192 | Safe out-of-the-box ceiling; increase for complex tasks |
repeat_penalty | 1.0 | No penalty; presence_penalty handles anti-repetition |
presence_penalty | 1.5 | Prevents repetitive reasoning loops |
min_p | 0.0 | Explicitly disabled per Qwen3.6 spec |
Users can override any parameter with gglib model update <id> --<flag> or
the equivalent UI control.
Trait Implementations§
Source§impl Clone for InferenceConfig
impl Clone for InferenceConfig
Source§fn clone(&self) -> InferenceConfig
fn clone(&self) -> InferenceConfig
1.0.0 · Source§fn clone_from(&mut self, source: &Self)
fn clone_from(&mut self, source: &Self)
source. Read more