InferenceConfig

Struct InferenceConfig 

Source
pub struct InferenceConfig {
    pub temperature: Option<f32>,
    pub top_p: Option<f32>,
    pub top_k: Option<i32>,
    pub max_tokens: Option<u32>,
    pub repeat_penalty: Option<f32>,
    pub presence_penalty: Option<f32>,
    pub min_p: Option<f32>,
}
Expand description

Inference parameters for LLM sampling.

All fields are optional to support partial configuration and fallback chains. Intended to be shared across model defaults, global settings, and request overrides.

§Hierarchy Resolution

When making an inference request, parameters are resolved in this order:

  1. Request-level override (user specified for this request)
  2. Per-model defaults (stored in Model.inference_defaults)
  3. Global settings (stored in Settings.inference_defaults)
  4. Hardcoded fallback (e.g., temperature = 0.7)

§Examples

use gglib_core::domain::InferenceConfig;

// Conservative settings for code generation
let code_gen = InferenceConfig {
    temperature: Some(0.2),
    top_p: Some(0.9),
    top_k: Some(40),
    max_tokens: Some(2048),
    repeat_penalty: Some(1.1),
    presence_penalty: None,
    min_p: None,
};

// Creative writing settings
let creative = InferenceConfig {
    temperature: Some(1.2),
    top_p: Some(0.95),
    ..Default::default()
};

Fields§

§temperature: Option<f32>

Sampling temperature (0.0 - 2.0).

Controls randomness in token selection:

  • Lower values (0.1-0.5): More deterministic, focused
  • Medium values (0.7-1.0): Balanced creativity
  • Higher values (1.1-2.0): More random, creative
§top_p: Option<f32>

Nucleus sampling threshold (0.0 - 1.0).

Considers only the top tokens whose cumulative probability exceeds this threshold. Common values: 0.9 (default), 0.95 (more diverse)

§top_k: Option<i32>

Top-K sampling limit.

Considers only the K most likely next tokens. Common values: 40 (default), 10 (focused), 100 (diverse)

§max_tokens: Option<u32>

Maximum tokens to generate in response.

Hard limit on response length. Does not include input tokens.

§repeat_penalty: Option<f32>

Repetition penalty (> 0.0, typically 1.0 - 1.3).

Penalizes repeated tokens to reduce repetitive output.

  • 1.0: No penalty (default)
  • 1.1-1.3: Moderate penalty
  • 1.3: Strong penalty (may hurt coherence)

§presence_penalty: Option<f32>

Presence penalty (0.0 - 2.0).

Penalizes tokens that have already appeared in the output, encouraging the model to cover new ground. Effective at preventing repetitive reasoning loops in thinking models.

  • 0.0: No penalty (default; disabled)
  • 1.5: Recommended for reasoning/thinking models (e.g. Qwen3.6, DeepSeek-R1)
  • 2.0: Avoid; may degrade coherence

§min_p: Option<f32>

Minimum-probability sampling threshold (0.0 - 1.0).

Removes tokens whose probability is below min_p × P(top token).

  • 0.0: Disabled (explicit off; recommended by Qwen3.6)
  • 0.05: llama.cpp built-in default when the flag is omitted

Implementations§

Source§

impl InferenceConfig

Source

pub const fn merge_with(&mut self, other: &Self)

Merge another config into this one, preferring values from other.

For each field, if other has Some(value), use it; otherwise keep self’s value. This is useful for applying fallback chains.

§Example
use gglib_core::domain::InferenceConfig;

let mut request = InferenceConfig {
    temperature: Some(0.8),
    ..Default::default()
};

let model_defaults = InferenceConfig {
    temperature: Some(0.5),
    top_p: Some(0.9),
    ..Default::default()
};

request.merge_with(&model_defaults);
assert_eq!(request.temperature, Some(0.8)); // Request value wins
assert_eq!(request.top_p, Some(0.9));      // Fallback to model default
Source

pub const fn with_hardcoded_defaults() -> Self

Create a new config with all fields set to sensible defaults.

These are the hardcoded fallback values used when no other defaults are configured.

Source

pub fn to_cli_args(&self) -> Vec<String>

Convert inference config to llama CLI arguments.

Returns a vector of argument strings suitable for passing to llama-server. Uses the same flag names as llama.cpp: --temp, --top-p, --top-k, -n, --repeat-penalty.

This is the single source of truth for CLI flag conversion, used by:

  • LlamaCommandBuilder (for CLI commands)
  • GUI server startup (via ServerConfig.extra_args)
§Example
use gglib_core::domain::InferenceConfig;

let config = InferenceConfig {
    temperature: Some(0.8),
    top_p: Some(0.9),
    top_k: None,
    max_tokens: Some(1024),
    repeat_penalty: None,
    presence_penalty: None,
    min_p: None,
};

let args = config.to_cli_args();
assert_eq!(args, vec!["--temp", "0.8", "--top-p", "0.9", "-n", "1024"]);
Source

pub const fn reasoning_profile() -> Self

Return a recommended InferenceConfig profile for reasoning / thinking models.

Applied automatically at import time when the "reasoning" capability tag is detected (e.g. Qwen3.6, DeepSeek-R1, QwQ). Values follow the Qwen3.6 upstream guidance for thinking mode — general tasks and are conservative enough to work well across all thinking-capable models.

ParameterValueRationale
temperature1.0Recommended thinking-mode baseline
top_p0.95Broad nucleus; standard for reasoning
top_k20Tighter than the 40 fallback; suppresses low-quality tokens
max_tokens8192Safe out-of-the-box ceiling; increase for complex tasks
repeat_penalty1.0No penalty; presence_penalty handles anti-repetition
presence_penalty1.5Prevents repetitive reasoning loops
min_p0.0Explicitly disabled per Qwen3.6 spec

Users can override any parameter with gglib model update <id> --<flag> or the equivalent UI control.

Trait Implementations§

Source§

impl Clone for InferenceConfig

Source§

fn clone(&self) -> InferenceConfig

Returns a duplicate of the value. Read more
1.0.0 · Source§

fn clone_from(&mut self, source: &Self)

Performs copy-assignment from source. Read more
Source§

impl Debug for InferenceConfig

Source§

fn fmt(&self, f: &mut Formatter<'_>) -> Result

Formats the value using the given formatter. Read more
Source§

impl Default for InferenceConfig

Source§

fn default() -> InferenceConfig

Returns the “default value” for a type. Read more
Source§

impl<'de> Deserialize<'de> for InferenceConfig

Source§

fn deserialize<__D>(__deserializer: __D) -> Result<Self, __D::Error>
where __D: Deserializer<'de>,

Deserialize this value from the given Serde deserializer. Read more
Source§

impl PartialEq for InferenceConfig

Source§

fn eq(&self, other: &InferenceConfig) -> bool

Tests for self and other values to be equal, and is used by ==.
1.0.0 · Source§

fn ne(&self, other: &Rhs) -> bool

Tests for !=. The default implementation is almost always sufficient, and should not be overridden without very good reason.
Source§

impl Serialize for InferenceConfig

Source§

fn serialize<__S>(&self, __serializer: __S) -> Result<__S::Ok, __S::Error>
where __S: Serializer,

Serialize this value into the given Serde serializer. Read more
Source§

impl StructuralPartialEq for InferenceConfig

Auto Trait Implementations§

Blanket Implementations§

Source§

impl<T> Any for T
where T: 'static + ?Sized,

Source§

fn type_id(&self) -> TypeId

Gets the TypeId of self. Read more
Source§

impl<T> Borrow<T> for T
where T: ?Sized,

Source§

fn borrow(&self) -> &T

Immutably borrows from an owned value. Read more
Source§

impl<T> BorrowMut<T> for T
where T: ?Sized,

Source§

fn borrow_mut(&mut self) -> &mut T

Mutably borrows from an owned value. Read more
Source§

impl<T> CloneToUninit for T
where T: Clone,

Source§

unsafe fn clone_to_uninit(&self, dest: *mut u8)

🔬This is a nightly-only experimental API. (clone_to_uninit)
Performs copy-assignment from self to dest. Read more
Source§

impl<T> From<T> for T

Source§

fn from(t: T) -> T

Returns the argument unchanged.

§

impl<T> Instrument for T

§

fn instrument(self, span: Span) -> Instrumented<Self>

Instruments this type with the provided [Span], returning an Instrumented wrapper. Read more
§

fn in_current_span(self) -> Instrumented<Self>

Instruments this type with the current Span, returning an Instrumented wrapper. Read more
Source§

impl<T, U> Into<U> for T
where U: From<T>,

Source§

fn into(self) -> U

Calls U::from(self).

That is, this conversion is whatever the implementation of From<T> for U chooses to do.

Source§

impl<T> Same for T

Source§

type Output = T

Should always be Self
Source§

impl<T> ToOwned for T
where T: Clone,

Source§

type Owned = T

The resulting type after obtaining ownership.
Source§

fn to_owned(&self) -> T

Creates owned data from borrowed data, usually by cloning. Read more
Source§

fn clone_into(&self, target: &mut T)

Uses borrowed data to replace owned data, usually by cloning. Read more
Source§

impl<T, U> TryFrom<U> for T
where U: Into<T>,

Source§

type Error = Infallible

The type returned in the event of a conversion error.
Source§

fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

Performs the conversion.
Source§

impl<T, U> TryInto<U> for T
where U: TryFrom<T>,

Source§

type Error = <U as TryFrom<T>>::Error

The type returned in the event of a conversion error.
Source§

fn try_into(self) -> Result<U, <U as TryFrom<T>>::Error>

Performs the conversion.
§

impl<T> WithSubscriber for T

§

fn with_subscriber<S>(self, subscriber: S) -> WithDispatch<Self>
where S: Into<Dispatch>,

Attaches the provided Subscriber to this type, returning a [WithDispatch] wrapper. Read more
§

fn with_current_subscriber(self) -> WithDispatch<Self>

Attaches the current default Subscriber to this type, returning a [WithDispatch] wrapper. Read more
Source§

impl<T> DeserializeOwned for T
where T: for<'de> Deserialize<'de>,