Skip to content
Discussion options

You must be logged in to vote

I'm responding to myself: the parameter --sleep-idle-seconds allows unloading the model from VRAM after a specified number of seconds, as indicated in the PR: #18228.
The model is reloaded again when a new task arrives.
Note that in the UI, the model still appears as "active" (a green dot next to the model name), but the VRAM is indeed freed.

For example, with a preset file preset.ini:

[MOE-qwen3-30b]
m = /home/IA/modeles/Qwen3-30B-A3B-Instruct-2507-UD-Q4_K_XL.gguf
top-p = 0.80
temp = 0.7
top-k = 20
min-p = 0.0
presence-penalty = 1.0
ctx-size = 52428
ngl = 99
jinja = true
fa = on
sleep-idle-seconds = 30

Replies: 1 comment

Comment options

You must be logged in to vote
0 replies
Answer selected by Seven-94
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
1 participant