-
Notifications
You must be signed in to change notification settings - Fork 14.4k
Closed
Labels
enhancementNew feature or requestNew feature or request
Description
Hello, I'm wondering how does llama.cpp manage the memory:
- Does llama.cpp allocate space for tensors including static parameter tensors and temporary tensors at once? I only accumulated the allocations of parameter tensors(e.g. blk.0.attn_q.weight) but no more temporary tensors (e.g. inp_embd). Could you please explain where the allocating process is?
- Will llama.cpp free all tensors at the end of the model inference process including both parameter tensors and temporary tensors? I'm also wondering where llama.cpp frees these tensors.
I hope I described my confusions properly and thanks for your attention.
Metadata
Metadata
Assignees
Labels
enhancementNew feature or requestNew feature or request