Struct InferenceConfig

Source

pub struct InferenceConfig {
    pub temperature: Option<f32>,
    pub top_p: Option<f32>,
    pub top_k: Option<i32>,
    pub max_tokens: Option<u32>,
    pub repeat_penalty: Option<f32>,
}

Expand description

Inference parameters for LLM sampling.

All fields are optional to support partial configuration and fallback chains. Intended to be shared across model defaults, global settings, and request overrides.

§Hierarchy Resolution

When making an inference request, parameters are resolved in this order:

Request-level override (user specified for this request)
Per-model defaults (stored in Model.inference_defaults)
Global settings (stored in Settings.inference_defaults)
Hardcoded fallback (e.g., temperature = 0.7)

§Examples

use gglib_core::domain::InferenceConfig;

// Conservative settings for code generation
let code_gen = InferenceConfig {
    temperature: Some(0.2),
    top_p: Some(0.9),
    top_k: Some(40),
    max_tokens: Some(2048),
    repeat_penalty: Some(1.1),
};

// Creative writing settings
let creative = InferenceConfig {
    temperature: Some(1.2),
    top_p: Some(0.95),
    ..Default::default()
};

Fields§

§temperature: Option<f32>

Sampling temperature (0.0 - 2.0).

Controls randomness in token selection:

Lower values (0.1-0.5): More deterministic, focused
Medium values (0.7-1.0): Balanced creativity
Higher values (1.1-2.0): More random, creative

§top_p: Option<f32>

Nucleus sampling threshold (0.0 - 1.0).

Considers only the top tokens whose cumulative probability exceeds this threshold. Common values: 0.9 (default), 0.95 (more diverse)

§top_k: Option<i32>

Top-K sampling limit.

Considers only the K most likely next tokens. Common values: 40 (default), 10 (focused), 100 (diverse)

§max_tokens: Option<u32>

Maximum tokens to generate in response.

Hard limit on response length. Does not include input tokens.

§repeat_penalty: Option<f32>

Repetition penalty (> 0.0, typically 1.0 - 1.3).

Penalizes repeated tokens to reduce repetitive output.

1.0: No penalty (default)
1.1-1.3: Moderate penalty
1.3: Strong penalty (may hurt coherence)

Implementations§

Source §

impl InferenceConfig

Source

pub const fn merge_with(&mut self, other: &Self)

Merge another config into this one, preferring values from other.

For each field, if other has Some(value), use it; otherwise keep self’s value. This is useful for applying fallback chains.

§Example

use gglib_core::domain::InferenceConfig;

let mut request = InferenceConfig {
    temperature: Some(0.8),
    ..Default::default()
};

let model_defaults = InferenceConfig {
    temperature: Some(0.5),
    top_p: Some(0.9),
    ..Default::default()
};

request.merge_with(&model_defaults);
assert_eq!(request.temperature, Some(0.8)); // Request value wins
assert_eq!(request.top_p, Some(0.9));      // Fallback to model default

Source

pub const fn with_hardcoded_defaults() -> Self

Create a new config with all fields set to sensible defaults.

These are the hardcoded fallback values used when no other defaults are configured.

Source

pub fn to_cli_args(&self) -> Vec<String>

Convert inference config to llama CLI arguments.

Returns a vector of argument strings suitable for passing to llama-cli or llama-server. Uses the same flag names as llama.cpp: --temp, --top-p, --top-k, -n, --repeat-penalty.

This is the single source of truth for CLI flag conversion, used by:

LlamaCommandBuilder (for CLI commands)
GUI server startup (via ServerConfig.extra_args)

§Example

use gglib_core::domain::InferenceConfig;

let config = InferenceConfig {
    temperature: Some(0.8),
    top_p: Some(0.9),
    top_k: None,
    max_tokens: Some(1024),
    repeat_penalty: None,
};

let args = config.to_cli_args();
assert_eq!(args, vec!["--temp", "0.8", "--top-p", "0.9", "-n", "1024"]);