pub struct InferenceConfig {
pub temperature: Option<f32>,
pub top_p: Option<f32>,
pub top_k: Option<i32>,
pub max_tokens: Option<u32>,
pub repeat_penalty: Option<f32>,
}Expand description
Inference parameters for LLM sampling.
All fields are optional to support partial configuration and fallback chains. Intended to be shared across model defaults, global settings, and request overrides.
§Hierarchy Resolution
When making an inference request, parameters are resolved in this order:
- Request-level override (user specified for this request)
- Per-model defaults (stored in
Model.inference_defaults) - Global settings (stored in
Settings.inference_defaults) - Hardcoded fallback (e.g., temperature = 0.7)
§Examples
use gglib_core::domain::InferenceConfig;
// Conservative settings for code generation
let code_gen = InferenceConfig {
temperature: Some(0.2),
top_p: Some(0.9),
top_k: Some(40),
max_tokens: Some(2048),
repeat_penalty: Some(1.1),
};
// Creative writing settings
let creative = InferenceConfig {
temperature: Some(1.2),
top_p: Some(0.95),
..Default::default()
};Fields§
§temperature: Option<f32>Sampling temperature (0.0 - 2.0).
Controls randomness in token selection:
- Lower values (0.1-0.5): More deterministic, focused
- Medium values (0.7-1.0): Balanced creativity
- Higher values (1.1-2.0): More random, creative
top_p: Option<f32>Nucleus sampling threshold (0.0 - 1.0).
Considers only the top tokens whose cumulative probability exceeds this threshold. Common values: 0.9 (default), 0.95 (more diverse)
top_k: Option<i32>Top-K sampling limit.
Considers only the K most likely next tokens. Common values: 40 (default), 10 (focused), 100 (diverse)
max_tokens: Option<u32>Maximum tokens to generate in response.
Hard limit on response length. Does not include input tokens.
repeat_penalty: Option<f32>Repetition penalty (> 0.0, typically 1.0 - 1.3).
Penalizes repeated tokens to reduce repetitive output.
- 1.0: No penalty (default)
- 1.1-1.3: Moderate penalty
-
1.3: Strong penalty (may hurt coherence)
Implementations§
Source§impl InferenceConfig
impl InferenceConfig
Sourcepub const fn merge_with(&mut self, other: &Self)
pub const fn merge_with(&mut self, other: &Self)
Merge another config into this one, preferring values from other.
For each field, if other has Some(value), use it; otherwise keep self’s value.
This is useful for applying fallback chains.
§Example
use gglib_core::domain::InferenceConfig;
let mut request = InferenceConfig {
temperature: Some(0.8),
..Default::default()
};
let model_defaults = InferenceConfig {
temperature: Some(0.5),
top_p: Some(0.9),
..Default::default()
};
request.merge_with(&model_defaults);
assert_eq!(request.temperature, Some(0.8)); // Request value wins
assert_eq!(request.top_p, Some(0.9)); // Fallback to model defaultSourcepub const fn with_hardcoded_defaults() -> Self
pub const fn with_hardcoded_defaults() -> Self
Create a new config with all fields set to sensible defaults.
These are the hardcoded fallback values used when no other defaults are configured.
Sourcepub fn to_cli_args(&self) -> Vec<String>
pub fn to_cli_args(&self) -> Vec<String>
Convert inference config to llama CLI arguments.
Returns a vector of argument strings suitable for passing to llama-cli or llama-server.
Uses the same flag names as llama.cpp: --temp, --top-p, --top-k, -n, --repeat-penalty.
This is the single source of truth for CLI flag conversion, used by:
LlamaCommandBuilder(for CLI commands)- GUI server startup (via
ServerConfig.extra_args)
§Example
use gglib_core::domain::InferenceConfig;
let config = InferenceConfig {
temperature: Some(0.8),
top_p: Some(0.9),
top_k: None,
max_tokens: Some(1024),
repeat_penalty: None,
};
let args = config.to_cli_args();
assert_eq!(args, vec!["--temp", "0.8", "--top-p", "0.9", "-n", "1024"]);Trait Implementations§
Source§impl Clone for InferenceConfig
impl Clone for InferenceConfig
Source§fn clone(&self) -> InferenceConfig
fn clone(&self) -> InferenceConfig
1.0.0 · Source§fn clone_from(&mut self, source: &Self)
fn clone_from(&mut self, source: &Self)
source. Read more