How does llama.cpp manage the memory?

Hello, I'm wondering how does llama.cpp manage the memory:
1. Does llama.cpp allocate space for tensors including static parameter tensors and temporary tensors at once? I only accumulated the allocations of parameter tensors(e.g. **blk.0.attn_q.weight**) but no more temporary tensors (e.g. **inp_embd**). Could you please explain where the allocating process is?
2. Will llama.cpp free all tensors at the end of the model inference process including both parameter tensors and temporary tensors? I'm also wondering where llama.cpp frees these tensors.

I hope I described my confusions properly and thanks for your attention.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

How does llama.cpp manage the memory? #6323

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

How does llama.cpp manage the memory? #6323

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions