llama server fails to find Intel GPU and crashes #1933

qmarcou · 2025-06-23T14:21:18Z

qmarcou
Jun 23, 2025

Issue Description

I'd like to use my intel iGPU (my CPU is bundled with an iGPU ID ID55 Intel Arc MeteorLake which is listed on ramalama's compatibility table) to run local chats, on an ubuntu 24.04 system with podman installed.

I however fail to serve/run any model, the container crashes upon start and doesn't even get listed in podman ps --all afterward. When using ramalama run amodel the terminal hangs until I enter any command and then fails with error Error: could not connect to: http://127.0.0.1:8080/v1/chat/completions not matter how long I wait before entering an input. After reading #1568 I've tried ramalama serve amodel instead and get

terminate called after throwing an instance of 'std::runtime_error'
  what():  can not find preferred GPU platform

Any pointers on how to debug?
Thanks for the help!

Steps to reproduce the issue

> pipx install ramalama
> ramalama pull tinyllama
> ramalama serve tinyllama

Describe the results you received

This fails with the following output:

:: initializing oneAPI environment ...
   entrypoint.sh: BASH_VERSION = 5.2.37(1)-release
   args: Using "$@" for setvars.sh arguments: /usr/libexec/ramalama/ramalama-serve-core llama-server --port 8080 --model /mnt/models/model.file --no-warmup --jinja --log-colors --alias tinyllama --ctx-size 2048 --temp 0.8 --cache-reuse 256 -ngl 999 --threads 11 --host 0.0.0.0
:: compiler -- latest
:: mkl -- latest
:: tbb -- latest
:: umf -- latest
:: oneAPI environment initialized ::
 
/lib64/libggml-base.so(+0x26d8) [0x76b7a9b3f6d8]
/lib64/libggml-base.so(ggml_print_backtrace+0x285) [0x76b7a9b3f6b5]
/lib64/libggml-base.so(+0x17486) [0x76b7a9b54486]
/lib64/libstdc++.so.6(+0x1eadc) [0x76b7a98e4adc]
/lib64/libstdc++.so.6(_ZSt10unexpectedv+0x0) [0x76b7a98ced3c]
/lib64/libstdc++.so.6(+0x1ed88) [0x76b7a98e4d88]
/lib64/libggml-sycl.so(_ZN4dpct7dev_mgrC2Ev+0x2067) [0x76b7a9c620d7]
/lib64/libggml-sycl.so(+0x3190) [0x76b7a9c35190]
/lib64/libggml-sycl.so(ggml_backend_sycl_reg+0xbc) [0x76b7a9c3779c]
/lib64/libggml.so(_ZN21ggml_backend_registryC2Ev+0x1d) [0x76b7aa0def0d]
/lib64/libggml.so(ggml_backend_reg_by_name+0x123) [0x76b7aa0dbc03]
/lib64/libllama.so(llama_supports_rpc+0xd) [0x76b7aa0f5acd]
llama-server() [0x4ee456]
llama-server() [0x4bfee2]
llama-server() [0x406506]
/lib64/libc.so.6(+0x35f5) [0x76b7a95bb5f5]
/lib64/libc.so.6(__libc_start_main+0x88) [0x76b7a95bb6a8]
llama-server() [0x4063e5]
terminate called after throwing an instance of 'std::runtime_error'
  what():  can not find preferred GPU platform

It seems that ramalama has downlaoded the correct image for my config:

> podman images
REPOSITORY                  TAG         IMAGE ID      CREATED     SIZE
quay.io/ramalama/intel-gpu  0.9         eac7acb20df9  6 days ago  3.3 GB

Using --dryrun to get the podman command I get:

> ramalama --dry-run serve tinyllama
podman run --rm --label ai.ramalama.model=tinyllama --label ai.ramalama.engine=podman --label ai.ramalama.runtime=llama.cpp --label ai.ramalama.port=8082 --label ai.ramalama.command=serve --device /dev/dri --device /dev/accel -e INTEL_VISIBLE_DEVICES=1 -p 8082:8082 --security-opt=label=disable --cap-drop=all --security-opt=no-new-privileges --pull newer --label ai.ramalama --name ramalama_k9gI9kV2vt --env=HOME=/tmp --init --mount=type=bind,src=/home/charles/.local/share/ramalama/store/ollama/tinyllama/tinyllama/blobs/sha256-2af3b81862c6be03c769683af18efdadb2c33f60ff32ab6f83e42c043d6c7816,destination=/mnt/models/model.file,ro --mount=type=bind,src=/home/charles/.local/share/ramalama/store/ollama/tinyllama/tinyllama/snapshots/sha256-2af3b81862c6be03c769683af18efdadb2c33f60ff32ab6f83e42c043d6c7816/chat_template_converted,destination=/mnt/models/chat_template.file,ro quay.io/ramalama/intel-gpu:0.9 /usr/libexec/ramalama/ramalama-serve-core llama-server --port 8082 --model /mnt/models/model.file --no-warmup --jinja --log-colors --alias tinyllama --ctx-size 2048 --temp 0.8 --cache-reuse 256 -ngl 999 --threads 11 --host 0.0.0.0

I've tried to create a container without the llama-server launch command:

> podman run --rm --label ai.ramalama.model=tinyllama --label ai.ramalama.engine=podman --label ai.ramalama.runtime=llama.cpp --label ai.ramalama.port=8082 --label ai.ramalama.command=serve --device /dev/dri --device /dev/accel -e INTEL_VISIBLE_DEVICES=1 -p 8082:8082 --security-opt=label=disable --cap-drop=all --security-opt=no-new-privileges --pull newer --label ai.ramalama --name ramalama_k9gI9kV2vt --env=HOME=/tmp --init --mount=type=bind,src=/home/charles/.local/share/ramalama/store/ollama/tinyllama/tinyllama/blobs/sha256-2af3b81862c6be03c769683af18efdadb2c33f60ff32ab6f83e42c043d6c7816,destination=/mnt/models/model.file,ro --mount=type=bind,src=/home/charles/.local/share/ramalama/store/ollama/tinyllama/tinyllama/snapshots/sha256-2af3b81862c6be03c769683af18efdadb2c33f60ff32ab6f83e42c043d6c7816/chat_template_converted,destination=/mnt/models/chat_template.file,ro quay.io/ramalama/intel-gpu:0.9

that gets created without error. The GPU devices seem to be correctly passed to the container:

> podman ps
CONTAINER ID  IMAGE                           COMMAND            CREATED         STATUS         PORTS                   NAMES
253b82f7917c  quay.io/ramalama/intel-gpu:0.9  tail -f /dev/null  13 seconds ago  Up 14 seconds  0.0.0.0:8080->8080/tcp  ramalama_k9gI9kV2vt

> podman exec ramalama_k9gI9kV2vt ls -la /dev/dri/
total 0
drwxr-xr-x  2 root   root         80 Jun 23 12:11 .
drwxr-xr-x  7 root   root        380 Jun 23 12:11 ..
crw-rw----+ 1 nobody nobody 226,   1 Jun 23 11:39 card1
crw-rw----+ 1 nobody nobody 226, 128 Jun 23 11:39 renderD128

Describe the results you expected

Well, something that finds my iGPU doesn't crash :)
And if doesn't find my GPU, I would expect it to still run onto CPU without crashing but issuing a warning.

ramalama info output

ramalama info
{
    "Accelerator": "intel",
    "Engine": {
        "Info": {
            "host": {
                "arch": "amd64",
                "buildahVersion": "1.33.7",
                "cgroupControllers": [
                    "memory",
                    "pids"
                ],
                "cgroupManager": "systemd",
                "cgroupVersion": "v2",
                "conmon": {
                    "package": "conmon_2.1.10+ds1-1build2_amd64",
                    "path": "/usr/bin/conmon",
                    "version": "conmon version 2.1.10, commit: unknown"
                },
                "cpuUtilization": {
                    "idlePercent": 99.55,
                    "systemPercent": 0.13,
                    "userPercent": 0.31
                },
                "cpus": 22,
                "databaseBackend": "sqlite",
                "distribution": {
                    "codename": "noble",
                    "distribution": "ubuntu",
                    "version": "24.04"
                },
                "eventLogger": "journald",
                "freeLocks": 2047,
                "hostname": "thor",
                "idMappings": {
                    "gidmap": [
                        {
                            "container_id": 0,
                            "host_id": 1000,
                            "size": 1
                        },
                        {
                            "container_id": 1,
                            "host_id": 100000,
                            "size": 65536
                        }
                    ],
                    "uidmap": [
                        {
                            "container_id": 0,
                            "host_id": 1000,
                            "size": 1
                        },
                        {
                            "container_id": 1,
                            "host_id": 100000,
                            "size": 65536
                        }
                    ]
                },
                "kernel": "6.11.0-26-generic",
                "linkmode": "dynamic",
                "logDriver": "journald",
                "memFree": 57842454528,
                "memTotal": 66820259840,
                "networkBackend": "netavark",
                "networkBackendInfo": {
                    "backend": "netavark",
                    "dns": {
                        "package": "aardvark-dns_1.4.0-5_amd64",
                        "path": "/usr/lib/podman/aardvark-dns",
                        "version": "aardvark-dns 1.4.0"
                    },
                    "package": "netavark_1.4.0-4_amd64",
                    "path": "/usr/lib/podman/netavark",
                    "version": "netavark 1.4.0"
                },
                "ociRuntime": {
                    "name": "crun",
                    "package": "crun_1.14.1-1_amd64",
                    "path": "/usr/bin/crun",
                    "version": "crun version 1.14.1\ncommit: de537a7965bfbe9992e2cfae0baeb56a08128171\nrundir: /run/user/1000/crun\nspec: 1.0.0\n+SYSTEMD +SELINUX +APPARMOR +CAP +SECCOMP +EBPF +WASM:wasmedge +YAJL"
                },
                "os": "linux",
                "pasta": {
                    "executable": "/usr/bin/pasta",
                    "package": "passt_0.0~git20240220.1e6f92b-1_amd64",
                    "version": "pasta unknown version\nCopyright Red Hat\nGNU General Public License, version 2 or later\n  <https://www.gnu.org/licenses/old-licenses/gpl-2.0.html>\nThis is free software: you are free to change and redistribute it.\nThere is NO WARRANTY, to the extent permitted by law.\n"
                },
                "remoteSocket": {
                    "exists": false,
                    "path": "/run/user/1000/podman/podman.sock"
                },
                "security": {
                    "apparmorEnabled": false,
                    "capabilities": "CAP_CHOWN,CAP_DAC_OVERRIDE,CAP_FOWNER,CAP_FSETID,CAP_KILL,CAP_NET_BIND_SERVICE,CAP_SETFCAP,CAP_SETGID,CAP_SETPCAP,CAP_SETUID,CAP_SYS_CHROOT",
                    "rootless": true,
                    "seccompEnabled": true,
                    "seccompProfilePath": "/usr/share/containers/seccomp.json",
                    "selinuxEnabled": false
                },
                "serviceIsRemote": false,
                "slirp4netns": {
                    "executable": "/usr/bin/slirp4netns",
                    "package": "slirp4netns_1.2.1-1build2_amd64",
                    "version": "slirp4netns version 1.2.1\ncommit: 09e31e92fa3d2a1d3ca261adaeb012c8d75a8194\nlibslirp: 4.7.0\nSLIRP_CONFIG_VERSION_MAX: 4\nlibseccomp: 2.5.5"
                },
                "swapFree": 8589930496,
                "swapTotal": 8589930496,
                "uptime": "2h 22m 30.00s (Approximately 0.08 days)",
                "variant": ""
            },
            "plugins": {
                "authorization": null,
                "log": [
                    "k8s-file",
                    "none",
                    "passthrough",
                    "journald"
                ],
                "network": [
                    "bridge",
                    "macvlan",
                    "ipvlan"
                ],
                "volume": [
                    "local"
                ]
            },
            "registries": {},
            "store": {
                "configFile": "/home/charles/.config/containers/storage.conf",
                "containerStore": {
                    "number": 1,
                    "paused": 0,
                    "running": 1,
                    "stopped": 0
                },
                "graphDriverName": "overlay",
                "graphOptions": {},
                "graphRoot": "/home/charles/.local/share/containers/storage",
                "graphRootAllocated": 854430220288,
                "graphRootUsed": 37641183232,
                "graphStatus": {
                    "Backing Filesystem": "extfs",
                    "Native Overlay Diff": "true",
                    "Supports d_type": "true",
                    "Supports shifting": "false",
                    "Supports volatile": "true",
                    "Using metacopy": "false"
                },
                "imageCopyTmpDir": "/var/tmp",
                "imageStore": {
                    "number": 1
                },
                "runRoot": "/run/user/1000/containers",
                "transientStore": false,
                "volumePath": "/home/charles/.local/share/containers/storage/volumes"
            },
            "version": {
                "APIVersion": "4.9.3",
                "Built": 0,
                "BuiltTime": "Thu Jan  1 01:00:00 1970",
                "GitCommit": "",
                "GoVersion": "go1.22.2",
                "Os": "linux",
                "OsArch": "linux/amd64",
                "Version": "4.9.3"
            }
        },
        "Name": "podman"
    },
    "Image": "quay.io/ramalama/intel-gpu:0.9",
    "Runtime": "llama.cpp",
    "Shortnames": {
        "Files": [
            "/home/charles/.local/share/pipx/venvs/ramalama/share/ramalama/shortnames.conf"
        ],
        "Names": {
            "cerebrum": "huggingface://froggeric/Cerebrum-1.0-7b-GGUF/Cerebrum-1.0-7b-Q4_KS.gguf",
            "deepseek": "ollama://deepseek-r1",
            "dragon": "huggingface://llmware/dragon-mistral-7b-v0/dragon-mistral-7b-q4_k_m.gguf",
            "gemma3": "hf://ggml-org/gemma-3-4b-it-GGUF",
            "gemma3:12b": "hf://ggml-org/gemma-3-12b-it-GGUF",
            "gemma3:1b": "hf://ggml-org/gemma-3-1b-it-GGUF/gemma-3-1b-it-Q4_K_M.gguf",
            "gemma3:27b": "hf://ggml-org/gemma-3-27b-it-GGUF",
            "gemma3:4b": "hf://ggml-org/gemma-3-4b-it-GGUF",
            "granite": "ollama://granite3.1-dense",
            "granite-code": "hf://ibm-granite/granite-3b-code-base-2k-GGUF/granite-3b-code-base.Q4_K_M.gguf",
            "granite-code:20b": "hf://ibm-granite/granite-20b-code-base-8k-GGUF/granite-20b-code-base.Q4_K_M.gguf",
            "granite-code:34b": "hf://ibm-granite/granite-34b-code-base-8k-GGUF/granite-34b-code-base.Q4_K_M.gguf",
            "granite-code:3b": "hf://ibm-granite/granite-3b-code-base-2k-GGUF/granite-3b-code-base.Q4_K_M.gguf",
            "granite-code:8b": "hf://ibm-granite/granite-8b-code-base-4k-GGUF/granite-8b-code-base.Q4_K_M.gguf",
            "granite-lab-7b": "huggingface://instructlab/granite-7b-lab-GGUF/granite-7b-lab-Q4_K_M.gguf",
            "granite-lab-8b": "huggingface://ibm-granite/granite-8b-code-base-GGUF/granite-8b-code-base.Q4_K_M.gguf",
            "granite-lab:7b": "huggingface://instructlab/granite-7b-lab-GGUF/granite-7b-lab-Q4_K_M.gguf",
            "granite:2b": "ollama://granite3.1-dense:2b",
            "granite:7b": "huggingface://instructlab/granite-7b-lab-GGUF/granite-7b-lab-Q4_K_M.gguf",
            "granite:8b": "ollama://granite3.1-dense:8b",
            "hermes": "huggingface://NousResearch/Hermes-2-Pro-Mistral-7B-GGUF/Hermes-2-Pro-Mistral-7B.Q4_K_M.gguf",
            "ibm/granite": "ollama://granite3.1-dense:8b",
            "ibm/granite:2b": "ollama://granite3.1-dense:2b",
            "ibm/granite:7b": "huggingface://instructlab/granite-7b-lab-GGUF/granite-7b-lab-Q4_K_M.gguf",
            "ibm/granite:8b": "ollama://granite3.1-dense:8b",
            "merlinite": "huggingface://instructlab/merlinite-7b-lab-GGUF/merlinite-7b-lab-Q4_K_M.gguf",
            "merlinite-lab-7b": "huggingface://instructlab/merlinite-7b-lab-GGUF/merlinite-7b-lab-Q4_K_M.gguf",
            "merlinite-lab:7b": "huggingface://instructlab/merlinite-7b-lab-GGUF/merlinite-7b-lab-Q4_K_M.gguf",
            "merlinite:7b": "huggingface://instructlab/merlinite-7b-lab-GGUF/merlinite-7b-lab-Q4_K_M.gguf",
            "mistral": "huggingface://TheBloke/Mistral-7B-Instruct-v0.2-GGUF/mistral-7b-instruct-v0.2.Q4_K_M.gguf",
            "mistral-small3.1": "hf://bartowski/mistralai_Mistral-Small-3.1-24B-Instruct-2503-GGUF/mistralai_Mistral-Small-3.1-24B-Instruct-2503-IQ2_M.gguf",
            "mistral-small3.1:24b": "hf://bartowski/mistralai_Mistral-Small-3.1-24B-Instruct-2503-GGUF/mistralai_Mistral-Small-3.1-24B-Instruct-2503-IQ2_M.gguf",
            "mistral:7b": "huggingface://TheBloke/Mistral-7B-Instruct-v0.2-GGUF/mistral-7b-instruct-v0.2.Q4_K_M.gguf",
            "mistral:7b-v1": "huggingface://TheBloke/Mistral-7B-Instruct-v0.1-GGUF/mistral-7b-instruct-v0.1.Q5_K_M.gguf",
            "mistral:7b-v2": "huggingface://TheBloke/Mistral-7B-Instruct-v0.2-GGUF/mistral-7b-instruct-v0.2.Q4_K_M.gguf",
            "mistral:7b-v3": "huggingface://MaziyarPanahi/Mistral-7B-Instruct-v0.3-GGUF/Mistral-7B-Instruct-v0.3.Q4_K_M.gguf",
            "mistral_code_16k": "huggingface://TheBloke/Mistral-7B-Code-16K-qlora-GGUF/mistral-7b-code-16k-qlora.Q4_K_M.gguf",
            "mistral_codealpaca": "huggingface://TheBloke/Mistral-7B-codealpaca-lora-GGUF/mistral-7b-codealpaca-lora.Q4_K_M.gguf",
            "mixtao": "huggingface://MaziyarPanahi/MixTAO-7Bx2-MoE-Instruct-v7.0-GGUF/MixTAO-7Bx2-MoE-Instruct-v7.0.Q4_K_M.gguf",
            "openchat": "huggingface://TheBloke/openchat-3.5-0106-GGUF/openchat-3.5-0106.Q4_K_M.gguf",
            "openorca": "huggingface://TheBloke/Mistral-7B-OpenOrca-GGUF/mistral-7b-openorca.Q4_K_M.gguf",
            "phi2": "huggingface://MaziyarPanahi/phi-2-GGUF/phi-2.Q4_K_M.gguf",
            "qwen2.5vl": "hf://ggml-org/Qwen2.5-VL-32B-Instruct-GGUF",
            "qwen2.5vl:2b": "hf://ggml-org/Qwen2.5-VL-2B-Instruct-GGUF",
            "qwen2.5vl:32b": "hf://ggml-org/Qwen2.5-VL-32B-Instruct-GGUF",
            "qwen2.5vl:3b": "hf://ggml-org/Qwen2.5-VL-3B-Instruct-GGUF",
            "qwen2.5vl:7b": "hf://ggml-org/Qwen2.5-VL-7B-Instruct-GGUF",
            "smollm:135m": "ollama://smollm:135m",
            "smolvlm": "hf://ggml-org/SmolVLM-500M-Instruct-GGUF",
            "smolvlm:256m": "hf://ggml-org/SmolVLM-256M-Instruct-GGUF",
            "smolvlm:2b": "hf://ggml-org/SmolVLM-Instruct-GGUF",
            "smolvlm:500m": "hf://ggml-org/SmolVLM-500M-Instruct-GGUF",
            "tiny": "ollama://tinyllama"
        }
    },
    "Store": "/home/charles/.local/share/ramalama",
    "UseContainer": true,
    "Version": "0.9.2"
}

Upstream Latest Release

No

Additional environment details

No response

Additional information

I have stumbled on a similar issue on the LocalAI project. sycl-ls is not installed in the ramalama container and I did not know how to interact with the intel APIs from bash with the tools already installed in the container.
Passing the complete device names --device /dev/dri/card1 and --device /dev/dri/renderD128 along with ramalama serve did not help.

taronaeo · 2025-06-23T14:46:48Z

taronaeo
Jun 23, 2025
Collaborator

just curious, did you test this directly with llama.cpp as well? i'm finding that the error came directly from Llama.cpp and would be good to test it at the engine level first.

0 replies

qmarcou · 2025-06-23T15:04:47Z

qmarcou
Jun 23, 2025
Author

Thanks for the quick reply, I've tried to run with the --nocontainer option but to no avail, is that what you meant?

> ramalama --nocontainer serve tinyllama
Traceback (most recent call last):
  File "/home/charles/.local/share/pipx/venvs/ramalama/libexec/ramalama/ramalama-serve-core", line 16, in <module>
    main(sys.argv[1:])
  File "/home/charles/.local/share/pipx/venvs/ramalama/libexec/ramalama/ramalama-serve-core", line 8, in main
    from ramalama.common import exec_cmd
ModuleNotFoundError: No module named 'ramalama'

but this may be a completely different install error...

0 replies

taronaeo · 2025-06-23T15:21:28Z

taronaeo
Jun 23, 2025
Collaborator

Unfortunately not. RamaLama uses Llama.cpp or vLLM under the hood and since the error came directly from Llama.cpp instead of RamaLama, it would be good if you can test it directly in Llama.cpp first.

@ericcurtin Do you know who maintains the compatibility table and if they can test this? Was wondering if RamaLama is failing to passthrough the GPU or is Llama.cpp failing to detect the GPU

0 replies

qmarcou · 2025-06-23T15:57:47Z

qmarcou
Jun 23, 2025
Author

I understand the logic, but I would need more context regarding llama.cpp install in order to make use of the intel GPU. Moreover, I'm just starting on a freshly installed system, and would appreciate keeping it somewhat clean of one-off installs (I was pretty happy about the container solution).

I could also try pulling a LocalAI image and try to run a server, LocalAI also depends on llama.cpp and should fail the same way if llama.cpp is the culprit (provided I find an image with the same llama.cpp version)

0 replies

taronaeo · 2025-06-25T14:57:00Z

taronaeo
Jun 25, 2025
Collaborator

Moreover, I'm just starting on a freshly installed system, and would appreciate keeping it somewhat clean of one-off installs (I was pretty happy about the container solution).

I feel you! Unfortunately I don't have an Intel GPU to test this problem so I can only count on you haha.

I could also try pulling a LocalAI image and try to run a server, LocalAI also depends on llama.cpp and should fail the same way if llama.cpp is the culprit (provided I find an image with the same llama.cpp version)

Please update your result once you have it. We can also try spinning up an ephemeral container with the required build tools already in to quickly test Llama.cpp's build with Intel GPU and teardown once complete.

0 replies

qmarcou · 2025-06-27T10:21:26Z

qmarcou
Jun 27, 2025
Author

Just a follow up on this: I installed a LocalAI docker image (details below) and I could run a mistral-7b model making use of my intel GPU using llama.cpp as a backend (verified via intel_gpu_top).

> docker image list
REPOSITORY        TAG               IMAGE ID       CREATED      SIZE
localai/localai   v3.0.0-sycl-f32   1e973dbe4525   7 days ago   16.5GB
> docker ps
CONTAINER ID   IMAGE                             COMMAND                  CREATED        STATUS                   PORTS                                       NAMES
2e995866de10   localai/localai:v3.0.0-sycl-f32   "/build/entrypoint.s…"   16 hours ago   Up 5 minutes (healthy)   0.0.0.0:8080->8080/tcp, :::8080->8080/tcp   local-ai
> docker exec local-ai sycl-ls 
[level_zero:gpu][level_zero:0] Intel(R) oneAPI Unified Runtime over Level-Zero, Intel(R) Arc(TM) Graphics 12.71.4 [1.6.32567+18]
[opencl:cpu][opencl:0] Intel(R) OpenCL, Intel(R) Core(TM) Ultra 7 165H OpenCL 3.0 (Build 0) [2025.19.3.0.17_230222]
[opencl:gpu][opencl:1] Intel(R) OpenCL Graphics, Intel(R) Arc(TM) Graphics OpenCL 3.0 NEO  [25.05.32567]
> docker exec local-ai ls /dev/dri
card1
renderD128

I'm not sure I understand how the image is built, but after a bit a search it seems it's pinned to this commit of llama.cpp (commit you may find familiar ;) ):

> docker exec -ti local-ai grep -rnwi '.' -e 'CPPLLAMA_VERSION'
./Makefile:9:CPPLLAMA_VERSION?=8d947136546773f6410756f37fcc5d3e65b8135d
+ some other hits

0 replies

ericcurtin · 2025-06-27T10:27:41Z

ericcurtin
Jun 27, 2025
Maintainer

Unfortunately not. RamaLama uses Llama.cpp or vLLM under the hood and since the error came directly from Llama.cpp instead of RamaLama, it would be good if you can test it directly in Llama.cpp first.

@ericcurtin Do you know who maintains the compatibility table and if they can test this? Was wondering if RamaLama is failing to passthrough the GPU or is Llama.cpp failing to detect the GPU

Hey @taronaeo I struggle to keep up with all the issues, PRs, etc. The community maintains the compatibility table, because there's just a huge wide array of hardware that no single person can test every piece of hardware. So we depend on the community to test and update the table as appropriate.

For me when I'm enabling hardware, I find that's the best route, like you suggested, try and get working with just llama.cpp, no containers, a lot of the issues are at llama.cpp level, then make it play nicely with RamaLama and containers.

0 replies

taronaeo · 2025-06-28T12:24:48Z

taronaeo
Jun 28, 2025
Collaborator

Just a follow up on this: I installed a LocalAI docker image (details below) and I could run a mistral-7b model making use of my intel GPU using llama.cpp as a backend (verified via intel_gpu_top).

docker image list
REPOSITORY TAG IMAGE ID CREATED SIZE
localai/localai v3.0.0-sycl-f32 1e973dbe4525 7 days ago 16.5GB
docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
2e995866de10 localai/localai:v3.0.0-sycl-f32 "/build/entrypoint.s…" 16 hours ago Up 5 minutes (healthy) 0.0.0.0:8080->8080/tcp, :::8080->8080/tcp local-ai
docker exec local-ai sycl-ls
[level_zero:gpu][level_zero:0] Intel(R) oneAPI Unified Runtime over Level-Zero, Intel(R) Arc(TM) Graphics 12.71.4 [1.6.32567+18]
[opencl:cpu][opencl:0] Intel(R) OpenCL, Intel(R) Core(TM) Ultra 7 165H OpenCL 3.0 (Build 0) [2025.19.3.0.17_230222]
[opencl:gpu][opencl:1] Intel(R) OpenCL Graphics, Intel(R) Arc(TM) Graphics OpenCL 3.0 NEO [25.05.32567]
docker exec local-ai ls /dev/dri
card1
renderD128
I'm not sure I understand how the image is built, but after a bit a search it seems it's pinned to this commit of llama.cpp (commit you may find familiar ;) ):

docker exec -ti local-ai grep -rnwi '.' -e 'CPPLLAMA_VERSION'
./Makefile:9:CPPLLAMA_VERSION?=8d947136546773f6410756f37fcc5d3e65b8135d

some other hits

From the link you provided, it appears that they are using the SYCL backend, while I would assume RamaLama here chose the Vulkan backend automatically. If you could find a similar LocalAI that uses Vulkan, that would be a better comparison to check if Llama.cpp is the problem, or RamaLama here. Sorry, even looking through our code, I'm not 100% confident we're using the Vulkan backend in combination with the Intel backend here, thus striking my comment out.

Alternatively, as I mentioned earlier, "We can also try spinning up an ephemeral container with the required build tools already in to quickly test Llama.cpp's build with Intel GPU and teardown once complete." - I think this would provide a clearer answer as to which component is failing and narrow down further from there

0 replies

qmarcou · 2025-06-30T09:08:53Z

qmarcou
Jun 30, 2025
Author

Yep I couldn't find in RamaLama's docs the actual backend used for Intel GPUs either.

Sure happy to try the containerized build of Llama.cpp if this can be helpful, but I would need more guidance. I'm a bit lost regarding backend selection and how to write a minimal test (just using llama-cli?) afterwards. The llama.cpp build documentation doesn't mention Linux as an option for Vulkan based builds. The SYCL backend seems to work in Docker as mentionned above (I have not tested running the docker container via podman). Or did you want to test even a simple CPU backend?

0 replies

0cc4m · 2025-07-03T11:37:09Z

0cc4m
Jul 3, 2025

The llama.cpp build documentation doesn't mention Linux as an option for Vulkan based builds.

It does mention it, but admittedly it's a little hidden. It's below the MSYS-2 section below the "Without Docker" header.

TLDR you can build most llama.cpp backends the same way:
Configure with cmake -B build_directory -DGGML_VULKAN=1 -DCMAKE_BUILD_TYPE=Release for Vulkan, then run cmake --build build_directory -j to actually build. Afterwards the binaries are in build_directory/bin/. You can replace build_directory with any folder you want.

0 replies

tristan-k · 2025-07-15T19:12:42Z

tristan-k
Jul 15, 2025

I'm also struggeling with gpu accelaration on my Intel Core Ultra 5 225H with Intel Arc 140T 7D51 based Xe-LPG+ iGPU. The vulkan backend quay.io/ramalama/ramalama works just fine.

root@ramalama:~# ramalama --debug run --image=quay.io/ramalama/intel-gpu gemma3:1b
2025-07-15 21:09:18 - DEBUG - run_cmd: docker inspect quay.io/ramalama/intel-gpu:0.11
2025-07-15 21:09:18 - DEBUG - Working directory: None
2025-07-15 21:09:18 - DEBUG - Ignore stderr: False
2025-07-15 21:09:18 - DEBUG - Ignore all: True
2025-07-15 21:09:18 - DEBUG - Checking if 8080 is available
2025-07-15 21:09:18 - DEBUG - Checking if 8080 is available
Checking for newer image quay.io/ramalama/intel-gpu
2025-07-15 21:09:18 - DEBUG - run_cmd: docker pull -q quay.io/ramalama/intel-gpu
2025-07-15 21:09:18 - DEBUG - Working directory: None
2025-07-15 21:09:18 - DEBUG - Ignore stderr: False
2025-07-15 21:09:18 - DEBUG - Ignore all: True
2025-07-15 21:09:19 - DEBUG - Command finished with return code: 0
2025-07-15 21:09:19 - DEBUG - exec_cmd: docker run --rm --label ai.ramalama.model=hf://ggml-org/gemma-3-1b-it-GGUF/gemma-3-1b-it-Q4_K_M.gguf --label ai.ramalama.engine=docker --label ai.ramalama.runtime=llama.cpp --label ai.ramalama.port=8080 --label ai.ramalama.command=run --device /dev/dri -e INTEL_VISIBLE_DEVICES=8 -p 8080:8080 --security-opt=label=disable --cap-drop=all --security-opt=no-new-privileges -t -d -i --label ai.ramalama --name ramalama_iNmaZfY0KL --env=HOME=/tmp --init --mount=type=bind,src=/var/lib/ramalama/store/huggingface/ggml-org/gemma-3-1b-it-GGUF/gemma-3-1b-it-Q4_K_M.gguf/blobs/sha256-8ccc5cd1f1b3602548715ae25a66ed73fd5dc68a210412eea643eb20eb75a135,destination=/mnt/models/gemma-3-1b-it-Q4_K_M.gguf,ro quay.io/ramalama/intel-gpu llama-server --port 8080 --model /mnt/models/gemma-3-1b-it-Q4_K_M.gguf --no-warmup --jinja --log-colors --alias ggml-org/gemma-3-1b-it-GGUF/gemma-3-1b-it-Q4_K_M.gguf --ctx-size 2048 --temp 0.8 --cache-reuse 256 -v -ngl 999 --threads 7 --host 0.0.0.0
46e19e99855b708139a1622ee34c8dff9d8eb7d290fd3711875d051a77d0f3d4
🐋 > Why is the blue sky blue?
2025-07-15 21:09:34 - DEBUG - Request: URL=http://127.0.0.1:8080/chat/completions, Data=b'{"stream": true, "messages": [{"role": "user", "content": "Why is the blue sky blue?"}], "model": "hf://ggml-org/gemma-3-1b-it-GGUF/gemma-3-1b-it-Q4_K_M.gguf"}', Headers={'Content-Type': 'application/json'}
Error: could not connect to: http://127.0.0.1:8080/chat/completions
2025-07-15 21:09:50 - DEBUG - run_cmd: docker inspect --format "{{ .Pod }}" ramalama_iNmaZfY0KL
2025-07-15 21:09:50 - DEBUG - Working directory: None
2025-07-15 21:09:50 - DEBUG - Ignore stderr: True
2025-07-15 21:09:50 - DEBUG - Ignore all: False
2025-07-15 21:09:50 - DEBUG - Command '['docker', 'inspect', '--format', '{{ .Pod }}', 'ramalama_iNmaZfY0KL']' returned non-zero exit status 1.
2025-07-15 21:09:50 - DEBUG - run_cmd: docker inspect --format "{{ .Pod }}" ramalama_iNmaZfY0KL-pod-model-server
2025-07-15 21:09:50 - DEBUG - Working directory: None
2025-07-15 21:09:50 - DEBUG - Ignore stderr: True
2025-07-15 21:09:50 - DEBUG - Ignore all: False
2025-07-15 21:09:50 - DEBUG - Command '['docker', 'inspect', '--format', '{{ .Pod }}', 'ramalama_iNmaZfY0KL-pod-model-server']' returned non-zero exit status 1.
2025-07-15 21:09:50 - DEBUG - run_cmd: docker stop -t=0 ramalama_iNmaZfY0KL
2025-07-15 21:09:50 - DEBUG - Working directory: None
2025-07-15 21:09:50 - DEBUG - Ignore stderr: False
2025-07-15 21:09:50 - DEBUG - Ignore all: False
Error response from daemon: No such container: ramalama_iNmaZfY0KL
🐋 > ^D

Maybe have a look at https://github.com/eleiton/ollama-intel-arc and how this project incoperates ipex-llm. This is docker image works for me but I would prefer to use ramalama if ipex-llm support is possible.

0 replies

ericcurtin · 2025-07-15T23:49:29Z

ericcurtin
Jul 15, 2025
Maintainer

Yep I couldn't find in RamaLama's docs the actual backend used for Intel GPUs either.

Sure happy to try the containerized build of Llama.cpp if this can be helpful, but I would need more guidance. I'm a bit lost regarding backend selection and how to write a minimal test (just using llama-cli?) afterwards. The llama.cpp build documentation doesn't mention Linux as an option for Vulkan based builds. The SYCL backend seems to work in Docker as mentionned above (I have not tested running the docker container via podman). Or did you want to test even a simple CPU backend?

Vulkan is one of the most mature backends for Linux using llama.cpp in fact.

0 replies

ericcurtin · 2025-07-15T23:50:33Z

ericcurtin
Jul 15, 2025
Maintainer

I'm also struggeling with gpu accelaration on my Intel Core Ultra 5 225H with Intel Arc 140T 7D51 based Xe-LPG+ iGPU. The vulkan backend quay.io/ramalama/ramalama works just fine.

root@ramalama:~# ramalama --debug run --image=quay.io/ramalama/intel-gpu gemma3:1b
2025-07-15 21:09:18 - DEBUG - run_cmd: docker inspect quay.io/ramalama/intel-gpu:0.11
2025-07-15 21:09:18 - DEBUG - Working directory: None
2025-07-15 21:09:18 - DEBUG - Ignore stderr: False
2025-07-15 21:09:18 - DEBUG - Ignore all: True
2025-07-15 21:09:18 - DEBUG - Checking if 8080 is available
2025-07-15 21:09:18 - DEBUG - Checking if 8080 is available
Checking for newer image quay.io/ramalama/intel-gpu
2025-07-15 21:09:18 - DEBUG - run_cmd: docker pull -q quay.io/ramalama/intel-gpu
2025-07-15 21:09:18 - DEBUG - Working directory: None
2025-07-15 21:09:18 - DEBUG - Ignore stderr: False
2025-07-15 21:09:18 - DEBUG - Ignore all: True
2025-07-15 21:09:19 - DEBUG - Command finished with return code: 0
2025-07-15 21:09:19 - DEBUG - exec_cmd: docker run --rm --label ai.ramalama.model=hf://ggml-org/gemma-3-1b-it-GGUF/gemma-3-1b-it-Q4_K_M.gguf --label ai.ramalama.engine=docker --label ai.ramalama.runtime=llama.cpp --label ai.ramalama.port=8080 --label ai.ramalama.command=run --device /dev/dri -e INTEL_VISIBLE_DEVICES=8 -p 8080:8080 --security-opt=label=disable --cap-drop=all --security-opt=no-new-privileges -t -d -i --label ai.ramalama --name ramalama_iNmaZfY0KL --env=HOME=/tmp --init --mount=type=bind,src=/var/lib/ramalama/store/huggingface/ggml-org/gemma-3-1b-it-GGUF/gemma-3-1b-it-Q4_K_M.gguf/blobs/sha256-8ccc5cd1f1b3602548715ae25a66ed73fd5dc68a210412eea643eb20eb75a135,destination=/mnt/models/gemma-3-1b-it-Q4_K_M.gguf,ro quay.io/ramalama/intel-gpu llama-server --port 8080 --model /mnt/models/gemma-3-1b-it-Q4_K_M.gguf --no-warmup --jinja --log-colors --alias ggml-org/gemma-3-1b-it-GGUF/gemma-3-1b-it-Q4_K_M.gguf --ctx-size 2048 --temp 0.8 --cache-reuse 256 -v -ngl 999 --threads 7 --host 0.0.0.0
46e19e99855b708139a1622ee34c8dff9d8eb7d290fd3711875d051a77d0f3d4
🐋 > Why is the blue sky blue?
2025-07-15 21:09:34 - DEBUG - Request: URL=http://127.0.0.1:8080/chat/completions, Data=b'{"stream": true, "messages": [{"role": "user", "content": "Why is the blue sky blue?"}], "model": "hf://ggml-org/gemma-3-1b-it-GGUF/gemma-3-1b-it-Q4_K_M.gguf"}', Headers={'Content-Type': 'application/json'}
Error: could not connect to: http://127.0.0.1:8080/chat/completions
2025-07-15 21:09:50 - DEBUG - run_cmd: docker inspect --format "{{ .Pod }}" ramalama_iNmaZfY0KL
2025-07-15 21:09:50 - DEBUG - Working directory: None
2025-07-15 21:09:50 - DEBUG - Ignore stderr: True
2025-07-15 21:09:50 - DEBUG - Ignore all: False
2025-07-15 21:09:50 - DEBUG - Command '['docker', 'inspect', '--format', '{{ .Pod }}', 'ramalama_iNmaZfY0KL']' returned non-zero exit status 1.
2025-07-15 21:09:50 - DEBUG - run_cmd: docker inspect --format "{{ .Pod }}" ramalama_iNmaZfY0KL-pod-model-server
2025-07-15 21:09:50 - DEBUG - Working directory: None
2025-07-15 21:09:50 - DEBUG - Ignore stderr: True
2025-07-15 21:09:50 - DEBUG - Ignore all: False
2025-07-15 21:09:50 - DEBUG - Command '['docker', 'inspect', '--format', '{{ .Pod }}', 'ramalama_iNmaZfY0KL-pod-model-server']' returned non-zero exit status 1.
2025-07-15 21:09:50 - DEBUG - run_cmd: docker stop -t=0 ramalama_iNmaZfY0KL
2025-07-15 21:09:50 - DEBUG - Working directory: None
2025-07-15 21:09:50 - DEBUG - Ignore stderr: False
2025-07-15 21:09:50 - DEBUG - Ignore all: False
Error response from daemon: No such container: ramalama_iNmaZfY0KL
🐋 > ^D

Maybe have a look at https://github.com/eleiton/ollama-intel-arc and how this project incoperates ipex-llm. This is docker image works for me but I would prefer to use ramalama if ipex-llm support is possible.

I'm speculating, but try podman. What is the OS here? We don't test docker often.

0 replies

tristan-k · 2025-07-21T17:44:43Z

tristan-k
Jul 21, 2025

I'm running this in a Ubuntu 25.04 LXC on Proxmox.

$ lsb_release -a
No LSB modules are available.
Distributor ID:	Ubuntu
Description:	Ubuntu 25.04
Release:	25.04
Codename:	plucky

Podman also fails:

root@ramalama:~# ramalama --debug run --image=quay.io/ramalama/intel-gpu gemma3:4b
2025-07-21 19:43:57 - DEBUG - run_cmd: podman inspect quay.io/ramalama/intel-gpu:0.11
2025-07-21 19:43:57 - DEBUG - Working directory: None
2025-07-21 19:43:57 - DEBUG - Ignore stderr: False
2025-07-21 19:43:57 - DEBUG - Ignore all: True
2025-07-21 19:43:57 - DEBUG - Checking if 8080 is available
2025-07-21 19:43:57 - DEBUG - Checking if 8080 is available
2025-07-21 19:43:57 - DEBUG - exec_cmd: podman run --rm --label ai.ramalama.model=hf://ggml-org/gemma-3-4b-it-GGUF --label ai.ramalama.engine=podman --label ai.ramalama.runtime=llama.cpp --label ai.ramalama.port=8080 --label ai.ramalama.command=run --device /dev/dri -e INTEL_VISIBLE_DEVICES=8 -p 8080:8080 --security-opt=label=disable --cap-drop=all --security-opt=no-new-privileges --pull newer -t -d -i --label ai.ramalama --name ramalama_ZNdssydLS5 --env=HOME=/tmp --init --mount=type=bind,src=/var/lib/ramalama/store/huggingface/ggml-org/gemma-3-4b-it-GGUF/blobs/sha256-882e8d2db44dc554fb0ea5077cb7e4bc49e7342a1f0da57901c0802ea21a0863,destination=/mnt/models/gemma-3-4b-it-Q4_K_M.gguf,ro --mount=type=bind,src=/var/lib/ramalama/store/huggingface/ggml-org/gemma-3-4b-it-GGUF/blobs/sha256-8c0fb064b019a6972856aaae2c7e4792858af3ca4561be2dbf649123ba6c40cb,destination=/mnt/models/mmproj-model-f16.gguf,ro quay.io/ramalama/intel-gpu llama-server --port 8080 --model /mnt/models/gemma-3-4b-it-Q4_K_M.gguf --no-warmup --mmproj /mnt/models/mmproj-model-f16.gguf --log-colors --alias ggml-org/gemma-3-4b-it-GGUF --ctx-size 2048 --temp 0.8 --cache-reuse 256 -v -ngl 999 --threads 7 --host 0.0.0.0
WARN[0000] "/run/user/0" directory set by $XDG_RUNTIME_DIR does not exist. Either create the directory or unset $XDG_RUNTIME_DIR.: faccessat /run/user/0: no such file or directory: Trying to pull image in the event that it is a public image.
95d3700c5c27e58d379eae19ad697a2162af648fbc91076b6b237347ae028cfa
🦭 > Why is the blue sky blue?
2025-07-21 19:44:05 - DEBUG - Request: URL=http://127.0.0.1:8080/chat/completions, Data=b'{"stream": true, "messages": [{"role": "user", "content": "Why is the blue sky blue?"}], "model": "hf://ggml-org/gemma-3-4b-it-GGUF"}', Headers={'Content-Type': 'application/json'}
Error: could not connect to: http://127.0.0.1:8080/chat/completions
2025-07-21 19:44:21 - DEBUG - run_cmd: podman inspect --format "{{ .Pod }}" ramalama_ZNdssydLS5
2025-07-21 19:44:21 - DEBUG - Working directory: None
2025-07-21 19:44:21 - DEBUG - Ignore stderr: True
2025-07-21 19:44:21 - DEBUG - Ignore all: False
2025-07-21 19:44:21 - DEBUG - Command '['podman', 'inspect', '--format', '{{ .Pod }}', 'ramalama_ZNdssydLS5']' returned non-zero exit status 125.
2025-07-21 19:44:21 - DEBUG - run_cmd: podman inspect --format "{{ .Pod }}" ramalama_ZNdssydLS5-pod-model-server
2025-07-21 19:44:21 - DEBUG - Working directory: None
2025-07-21 19:44:21 - DEBUG - Ignore stderr: True
2025-07-21 19:44:21 - DEBUG - Ignore all: False
2025-07-21 19:44:21 - DEBUG - Command '['podman', 'inspect', '--format', '{{ .Pod }}', 'ramalama_ZNdssydLS5-pod-model-server']' returned non-zero exit status 125.
2025-07-21 19:44:21 - DEBUG - run_cmd: podman stop -t=0 --ignore True ramalama_ZNdssydLS5
2025-07-21 19:44:21 - DEBUG - Working directory: None
2025-07-21 19:44:21 - DEBUG - Ignore stderr: False
2025-07-21 19:44:21 - DEBUG - Ignore all: False
2025-07-21 19:44:21 - DEBUG - Command finished with return code: 0
2025-07-21 19:44:21 - DEBUG - run_cmd: podman inspect --format "{{ .Pod }}" ramalama_ZNdssydLS5
2025-07-21 19:44:21 - DEBUG - Working directory: None
2025-07-21 19:44:21 - DEBUG - Ignore stderr: True
2025-07-21 19:44:21 - DEBUG - Ignore all: False
2025-07-21 19:44:21 - DEBUG - Command '['podman', 'inspect', '--format', '{{ .Pod }}', 'ramalama_ZNdssydLS5']' returned non-zero exit status 125.
2025-07-21 19:44:21 - DEBUG - run_cmd: podman inspect --format "{{ .Pod }}" ramalama_ZNdssydLS5-pod-model-server
2025-07-21 19:44:21 - DEBUG - Working directory: None
2025-07-21 19:44:21 - DEBUG - Ignore stderr: True
2025-07-21 19:44:21 - DEBUG - Ignore all: False
2025-07-21 19:44:21 - DEBUG - Command '['podman', 'inspect', '--format', '{{ .Pod }}', 'ramalama_ZNdssydLS5-pod-model-server']' returned non-zero exit status 125.
2025-07-21 19:44:21 - DEBUG - run_cmd: podman stop -t=0 --ignore True ramalama_ZNdssydLS5
2025-07-21 19:44:21 - DEBUG - Working directory: None
2025-07-21 19:44:21 - DEBUG - Ignore stderr: False
2025-07-21 19:44:21 - DEBUG - Ignore all: False
2025-07-21 19:44:21 - DEBUG - Command finished with return code: 0

As before the vulkan backend works just fine:

root@ramalama:~# ramalama --debug run --image=quay.io/ramalama/ramalama:latest gemma3:4b
2025-07-21 19:49:24 - DEBUG - run_cmd: podman inspect quay.io/ramalama/intel-gpu:0.11
2025-07-21 19:49:24 - DEBUG - Working directory: None
2025-07-21 19:49:24 - DEBUG - Ignore stderr: False
2025-07-21 19:49:24 - DEBUG - Ignore all: True
2025-07-21 19:49:24 - DEBUG - Checking if 8080 is available
2025-07-21 19:49:24 - DEBUG - Checking if 8080 is available
2025-07-21 19:49:24 - DEBUG - exec_cmd: podman run --rm --label ai.ramalama.model=hf://ggml-org/gemma-3-4b-it-GGUF --label ai.ramalama.engine=podman --label ai.ramalama.runtime=llama.cpp --label ai.ramalama.port=8080 --label ai.ramalama.command=run --device /dev/dri -e INTEL_VISIBLE_DEVICES=7 -p 8080:8080 --security-opt=label=disable --cap-drop=all --security-opt=no-new-privileges --pull newer -t -d -i --label ai.ramalama --name ramalama_wAP5IwgFL6 --env=HOME=/tmp --init --mount=type=bind,src=/var/lib/ramalama/store/huggingface/ggml-org/gemma-3-4b-it-GGUF/blobs/sha256-882e8d2db44dc554fb0ea5077cb7e4bc49e7342a1f0da57901c0802ea21a0863,destination=/mnt/models/gemma-3-4b-it-Q4_K_M.gguf,ro --mount=type=bind,src=/var/lib/ramalama/store/huggingface/ggml-org/gemma-3-4b-it-GGUF/blobs/sha256-8c0fb064b019a6972856aaae2c7e4792858af3ca4561be2dbf649123ba6c40cb,destination=/mnt/models/mmproj-model-f16.gguf,ro quay.io/ramalama/ramalama:latest llama-server --port 8080 --model /mnt/models/gemma-3-4b-it-Q4_K_M.gguf --no-warmup --mmproj /mnt/models/mmproj-model-f16.gguf --log-colors --alias ggml-org/gemma-3-4b-it-GGUF --ctx-size 2048 --temp 0.8 --cache-reuse 256 -v -ngl 999 --threads 7 --host 0.0.0.0
WARN[0000] "/run/user/0" directory set by $XDG_RUNTIME_DIR does not exist. Either create the directory or unset $XDG_RUNTIME_DIR.: faccessat /run/user/0: no such file or directory: Trying to pull image in the event that it is a public image.
ad1347a5ea1f62026e76383f2aa7a3d11bb8db9964597839076455064efe29d4
🦭 > Why is the blue sky blue?
2025-07-21 19:49:33 - DEBUG - Request: URL=http://127.0.0.1:8080/chat/completions, Data=b'{"stream": true, "messages": [{"role": "user", "content": "Why is the blue sky blue?"}], "model": "hf://ggml-org/gemma-3-4b-it-GGUF"}', Headers={'Content-Type': 'application/json'}
The blue sky is a beautiful and fascinating phenomenon, and it's all thanks to a process called **Rayleigh scattering**. Here's a breakdown of why it happens:

**1. Sunlight and its Colors:**

* Sunlight appears white, but it's actually made up of all the colors of the rainbow – red, orange, yellow, green, blue, indigo, and violet. These colors have different wavelengths.
* Red and orange have longer wavelengths.
* Blue and violet have shorter wavelengths.

**2. What is Rayleigh Scattering?**

* As sunlight enters the Earth's atmosphere, it bumps into tiny air molecules (mostly nitrogen and oxygen).
* **Rayleigh scattering** is the scattering of electromagnetic radiation (like sunlight) by particles of a much smaller wavelength.  Essentially, the air molecules *scatter* the light in all directions.
* **Crucially, shorter wavelengths (blue and violet) are scattered *much more* strongly than longer wavelengths (red and orange).** It’s like throwing small balls (blue light) and large balls (red light) at a bumpy surface – the small balls will bounce off in all directions more readily.

**3. Why Blue Specifically?**

* Violet light is scattered even more than blue light. However, there are a couple of reasons we see a blue sky instead of violet:
    * **The Sun Emits Less Violet Light:** The sun actually emits less violet light than blue light.
    * **Our Eyes Are Less Sensitive to Violet:** Human eyes are less sensitive to violet light than they are to blue light.


**Think of it like this:** Imagine shining a flashlight through a room full of tiny dust particles. The blue light will bounce around everywhere, making the room appear blue.

**Why sunsets are red/orange:**

At sunrise and sunset, the sunlight has to travel through *much* more of the atmosphere to reach our eyes. This longer path means that most of the blue light has already been scattered away.  Only the longer wavelengths (red and orange) can penetrate through the atmosphere and reach our eyes, giving us those beautiful sunset colors.

---

**Resources for Further Learning:**

* **NASA - Why is the Sky Blue?:** [https://science.nasa.gov/sky-science/why-is-the-sky-blue/](https://science.nasa.gov/sky-science/why-is-the-sky-blue/)
* **HowStuffWorks - Why is the Sky Blue?:** [https://www.howstuffworks.com/why-is-the-sky-blue.html](https://www.howstuffworks.com/why-is-the-sky-blue.html)


Do you want to delve deeper into a specific aspect of this, such as:

*   The math behind Rayleigh scattering?
*   How this relates to other scattering phenomena (like clouds)?
🦭 > /bye
2025-07-21 19:53:58 - DEBUG - run_cmd: podman inspect --format "{{ .Pod }}" ramalama_wAP5IwgFL6
2025-07-21 19:53:58 - DEBUG - Working directory: None
2025-07-21 19:53:58 - DEBUG - Ignore stderr: True
2025-07-21 19:53:58 - DEBUG - Ignore all: False
2025-07-21 19:53:59 - DEBUG - Command finished with return code: 0
2025-07-21 19:53:59 - DEBUG - run_cmd: podman stop -t=0 --ignore True ramalama_wAP5IwgFL6
2025-07-21 19:53:59 - DEBUG - Working directory: None
2025-07-21 19:53:59 - DEBUG - Ignore stderr: False
2025-07-21 19:53:59 - DEBUG - Ignore all: False
2025-07-21 19:53:59 - DEBUG - Command finished with return code: 0

0 replies

rhatdan · 2025-07-21T23:56:41Z

rhatdan
Jul 21, 2025
Maintainer

Could you try ramalama serve rather then run to see if it throws any errors?

0 replies

rhatdan · 2025-07-24T10:51:00Z

rhatdan
Jul 24, 2025
Maintainer

@afazekas PTAL

0 replies

afazekas · 2025-07-24T21:30:35Z

afazekas
Jul 24, 2025

(ramalama-0.11.0-1.fc42.noarch), podman.

intel_gpu:
ramalama serve --image=quay.io/ramalama/intel-gpu:latest gemma3:4b
llama_model_load_from_file_impl: using device SYCL0 (Intel(R) Iris(R) Xe Graphics) - 29196 MiB free

both worked on my laptop.

Probably good idea to try to run intel diag tools from similar container to see is it at least listing the device.
also good idea to check dmesg (maybe unknown instructions .. etc).

0 replies

tristan-k · 2025-08-19T17:56:17Z

tristan-k
Aug 19, 2025

This is still broken for me on the latest ramalama 0.12.0. I dont know how to run intel diag tools inside the ramalama container but inside the intelanalytics/ipex-llm-inference-cpp-xpu:latest I can run sycl-ls and the iGPU gets recognized.

$ docker exec -it ollama-intel-arc /bin/bash
$ sycl-ls
[level_zero:gpu][level_zero:0] Intel(R) oneAPI Unified Runtime over Level-Zero, Intel(R) Graphics [0x7d51] 12.74.0 [1.6.32224.500000]
[opencl:cpu][opencl:0] Intel(R) OpenCL, Intel(R) Core(TM) Ultra 5 225H OpenCL 3.0 (Build 0) [2024.18.12.0.05_160000]
[opencl:gpu][opencl:1] Intel(R) OpenCL Graphics, Intel(R) Graphics [0x7d51] OpenCL 3.0 NEO  [24.52.32224.5]

I tried to start the podman container manually but it just hangs.

$ podman run quay.io/ramalama/intel-gpu

:: initializing oneAPI environment ...
   entrypoint.sh: BASH_VERSION = 5.2.37(1)-release
   args: Using "$@" for setvars.sh arguments: tail -f /dev/null
:: compiler -- latest
:: mkl -- latest
:: tbb -- latest
:: umf -- latest
:: oneAPI environment initialized ::
^C

$ ramalama version
ramalama version 0.12.0

$ ramalama --debug serve --image=quay.io/ramalama/intel-gpu:latest gemma3:4b
2025-08-19 19:48:42 - DEBUG - run_cmd: podman inspect quay.io/ramalama/intel-gpu:0.12
2025-08-19 19:48:42 - DEBUG - Working directory: None
2025-08-19 19:48:42 - DEBUG - Ignore stderr: False
2025-08-19 19:48:42 - DEBUG - Ignore all: True
2025-08-19 19:48:42 - DEBUG - Checking if 8080 is available
2025-08-19 19:48:42 - DEBUG - exec_cmd: podman run --rm --label ai.ramalama.model=hf://ggml-org/gemma-3-4b-it-GGUF --label ai.ramalama.engine=podman --label ai.ramalama.runtime=llama.cpp --label ai.ramalama.port=8080 --label ai.ramalama.command=serve --device /dev/dri -e INTEL_VISIBLE_DEVICES=8 -p 8080:8080 --security-opt=label=disable --cap-drop=all --security-opt=no-new-privileges --pull newer --label ai.ramalama --name ramalama_BhSawCyJGP --env=HOME=/tmp --init --mount=type=bind,src=/var/lib/ramalama/store/huggingface/ggml-org/gemma-3-4b-it-GGUF/blobs/sha256-882e8d2db44dc554fb0ea5077cb7e4bc49e7342a1f0da57901c0802ea21a0863,destination=/mnt/models/gemma-3-4b-it-Q4_K_M.gguf,ro --mount=type=bind,src=/var/lib/ramalama/store/huggingface/ggml-org/gemma-3-4b-it-GGUF/blobs/sha256-8c0fb064b019a6972856aaae2c7e4792858af3ca4561be2dbf649123ba6c40cb,destination=/mnt/models/mmproj-model-f16.gguf,ro quay.io/ramalama/intel-gpu:latest llama-server --port 8080 --model /mnt/models/gemma-3-4b-it-Q4_K_M.gguf --no-warmup --mmproj /mnt/models/mmproj-model-f16.gguf --log-colors --alias ggml-org/gemma-3-4b-it-GGUF --ctx-size 2048 --temp 0.8 --cache-reuse 256 -v -ngl 999 --threads 7 --host 0.0.0.0

:: initializing oneAPI environment ...
   entrypoint.sh: BASH_VERSION = 5.2.37(1)-release
   args: Using "$@" for setvars.sh arguments: llama-server --port 8080 --model /mnt/models/gemma-3-4b-it-Q4_K_M.gguf --no-warmup --mmproj /mnt/models/mmproj-model-f16.gguf --log-colors --alias ggml-org/gemma-3-4b-it-GGUF --ctx-size 2048 --temp 0.8 --cache-reuse 256 -v -ngl 999 --threads 7 --host 0.0.0.0
:: compiler -- latest
:: mkl -- latest
:: tbb -- latest
:: umf -- latest
:: oneAPI environment initialized ::

/lib64/libggml-base.so(+0x2748) [0x7ac9288b5748]
/lib64/libggml-base.so(ggml_print_backtrace+0x285) [0x7ac9288b5725]
/lib64/libggml-base.so(+0x19466) [0x7ac9288cc466]
/lib64/libstdc++.so.6(+0x1eb9c) [0x7ac92865ab9c]
/lib64/libstdc++.so.6(_ZSt10unexpectedv+0x0) [0x7ac928644d3a]
/lib64/libstdc++.so.6(+0x1ee48) [0x7ac92865ae48]
/lib64/libggml-sycl.so(_ZN4dpct7dev_mgrC2Ev+0x1f73) [0x7ac9289d9dd3]
/lib64/libggml-sycl.so(+0x3020) [0x7ac9289a7020]
/lib64/libggml-sycl.so(ggml_backend_sycl_reg+0xbc) [0x7ac9289a962c]
/lib64/libggml.so(_ZN21ggml_backend_registryC2Ev+0x1d) [0x7ac928eb9eed]
/lib64/libggml.so(ggml_backend_reg_by_name+0x123) [0x7ac928eb6c03]
/lib64/libllama.so(llama_supports_rpc+0xd) [0x7ac928ed335d]
llama-server() [0x4f1ba6]
llama-server() [0x4c2982]
llama-server() [0x406566]
/lib64/libc.so.6(+0x3575) [0x7ac928332575]
/lib64/libc.so.6(__libc_start_main+0x88) [0x7ac928332628]
llama-server() [0x406445]
terminate called after throwing an instance of 'std::runtime_error'
  what():  can not find preferred GPU platform

0 replies

rhatdan · 2025-08-20T13:15:35Z

rhatdan
Aug 20, 2025
Maintainer

Have you tried to run sycl-ls in the same conainer image launched with Podman? I mean the image you ran a container with docker run with podman and see if the sycl-ls command works?

0 replies

qmarcou · 2025-09-09T15:19:04Z

qmarcou
Sep 9, 2025
Author

Hi,
Sorry for dropping out of this for some time.
I've updated to ramalama 0.12.1 and just pulled the latest podman image and I'm still facing the same issue.

Have you tried to run sycl-ls in the same conainer image launched with Podman? I mean the image you ran a container with docker run with podman and see if the sycl-ls command works?

just tried what @rhatdan asked and it works:

> podman run docker.io/intelanalytics/ipex-llm-inference-cpp-xpu:latest sycl-ls
[opencl:cpu][opencl:0] Intel(R) OpenCL, Intel(R) Core(TM) Ultra 7 165H OpenCL 3.0 (Build 0) [2024.18.12.0.05_160000]

on the other hand sycl-ls fails inside the ramalama container:

> podman run quay.io/ramalama/intel-gpu:latest sycl-ls
 
:: initializing oneAPI environment ...
   entrypoint.sh: BASH_VERSION = 5.2.37(1)-release
   args: Using "$@" for setvars.sh arguments: sycl-ls
:: compiler -- latest
:: mkl -- latest
:: tbb -- latest
:: umf -- latest
:: oneAPI environment initialized ::
 
/usr/bin/entrypoint.sh: line 6: exec: sycl-ls: not found

0 replies

qmarcou · 2025-09-11T08:24:42Z

qmarcou
Sep 11, 2025
Author

Coming back to this I think the issue was hiding in plain sight, and is related to permissions set on /dev/dri/renderXXX.

First coming back to my previous comment

just tried what @rhatdan asked and it works:

podman run docker.io/intelanalytics/ipex-llm-inference-cpp-xpu:latest sycl-ls
[opencl:cpu][opencl:0] Intel(R) OpenCL, Intel(R) Core(TM) Ultra 7 165H OpenCL 3.0 (Build 0) [2024.18.12.0.05_160000]

I actually forgot to pass /dev/dri as device, and this lead me to hit the same what(): can not find preferred GPU platform error when trying to run llama.cpp inside the ipex-llm container. Fixing this example gives a richer output:

> podman run --device /dev/dri/ docker.io/intelanalytics/ipex-llm-inference-cpp-xpu:latest sycl-ls
[level_zero:gpu][level_zero:0] Intel(R) oneAPI Unified Runtime over Level-Zero, Intel(R) Arc(TM) Graphics 12.71.4 [1.6.32224.500000]
[opencl:cpu][opencl:0] Intel(R) OpenCL, Intel(R) Core(TM) Ultra 7 165H OpenCL 3.0 (Build 0) [2024.18.12.0.05_160000]
[opencl:gpu][opencl:1] Intel(R) OpenCL Graphics, Intel(R) Arc(TM) Graphics OpenCL 3.0 NEO  [24.52.32224.5]

Now coming back to our issue with ramalama, mudler/LocalAI#3437 mentioned rendering device permissions. In that issue permissions on /dev/dri/renderXXX is set to crw-rw-rw- 1 root render. While on my system permissions are more stringent:

> ls -la /dev/dri
crw-rw----+  1 root video  226,   1 Sep 11 09:18 card1
crw-rw----+  1 root render 226, 128 Sep 11 09:18 renderD128

Trying to give rw access to others enabled ramalama to find the GPU (although I now get another error already reported #1920):

> sudo chmod o+rw /dev/dri/renderD128
> ramalama serve tinyllama
:: initializing oneAPI environment ...
   entrypoint.sh: BASH_VERSION = 5.2.37(1)-release
   args: Using "$@" for setvars.sh arguments: llama-server --port 8084 --model /mnt/models/tinyllama --no-warmup --jinja --chat-template-file /mnt/models/chat_template_converted --log-colors --alias tinyllama --ctx-size 2048 --temp 0.8 --cache-reuse 256 -ngl 999 --threads 11 --host 0.0.0.0
:: compiler -- latest
:: mkl -- latest
:: tbb -- latest
:: umf -- latest
:: oneAPI environment initialized ::
 
error while handling argument "--log-colors": error: unkown value for --log-colors: '--alias'


usage:
--log-colors [on|off|auto]              Set colored logging ('on', 'off', or 'auto', default: 'auto')
                                        'auto' enables colors when output is to a terminal
                                        (env: LLAMA_LOG_COLORS)


to show complete usage, run with -h

I'm however not satisfied with this solution which seems insecure. And I guess running the podman container adding it to the render group would be better. Cf related issues in other projects:

0 replies

giuseppe · 2025-09-11T12:10:33Z

giuseppe
Sep 11, 2025
Maintainer

have you already tried using --keep-groups to ramalama serve?

0 replies

qmarcou · 2025-09-11T12:43:22Z

qmarcou
Sep 11, 2025
Author

I had done it at the podman level, tried several variation on groups related commands, starting from the command spat by ramalama serve --dry-run to no avail. I did not know I could pass this option at the ramalama level, just tried ramalama serve --keep-groups tinyllama, but it still fails to find a GPU.
Interestingly this is not a problem when running the ipex-llm docker using podman, thus also without root privileges.

0 replies

giuseppe · 2025-09-11T12:49:59Z

giuseppe
Sep 11, 2025
Maintainer

is your user part of the render group?

0 replies

qmarcou · 2025-09-11T13:21:26Z

qmarcou
Sep 11, 2025
Author

Nope, I should have pointed this out indeed. The default render group has no user tied to it.

0 replies

giuseppe · 2025-09-11T13:24:59Z

giuseppe
Sep 11, 2025
Maintainer

how does it work on the host? Can your user access the device file?

0 replies

giuseppe · 2025-09-11T15:01:57Z

giuseppe
Sep 11, 2025
Maintainer

if the user cannot access the device on the host, it won't automatically get access to it in the container. If access to the device is limited to root (and you want to keep it that way), then the container must be rootful

0 replies

qmarcou · 2025-09-15T08:11:15Z

qmarcou
Sep 15, 2025
Author

I just found this Intel doc on setting up permissions: https://www.intel.com/content/www/us/en/docs/oneapi/installation-guide-hpc-cluster/2025-1/step-4-set-up-user-permissions-for-using-the.html#SET-PERMISSIONS

It states:

To be able to use an Intel GPU, users must have the permission to access the following device files:

/dev/dri/card* are used for direct rendering devices and provide full privileged access to the GPU software

/dev/dri/renderD* provide non-privileged access to the GPU hardware, which is typically sufficient for compute tasks performed by non-privileged users

By default, access to the device files is controlled by one of the following groups:

“render” (on Ubuntu* 19 and higher, CentOS* 8, and Fedora* 31) local group, which was introduced on RHEL* 8.x and Ubuntu* 19.x for users requiring less-privileged use of the GPU for things like computation

“video” (on Ubuntu* 18, Fedora* 30, and SLES* 15 SP1) local group, which gives much more privileged access to the GPU

You have three options to enable non-privileged users to run compute tasks on nodes with Intel GPUs:

Assign each user who might want to compute on Intel GPUs to the local “render” or “video” group (depending on the OS version) on every node with an Intel GPU. This may be impractical if you have a large number of GPU nodes, a volatile userbase, or use a system image for your cluster nodes that is not updated often (updates are the only time you could add additional users to the local “render” or “video” groups).

Assign user permissions based on allocation type and job queue.

Control access to the GPUs on a node using a udev rule. To achieve this, create a udev rule /etc/udev/rules.d/99-i915-change-group.rules with the following contents:

So I guess I'll just add my current user to the render group to get ramalama up and running. Another more restrictive option would be to create a dedicated ramalama user that belongs to the render group, but given that render has already only unprivileged access to to GPU through /dev/dri/renderD* that may be too much.

This leaves two questions:

How come I could run the ipex-llm container to access GPU without having to add my user to the render group? It is a docker image but run via podman, and should not have root privileges (?).
Should this permission problem be detected by ramalama before pulling the podman image? Given that it will occur on every ubuntu/centOS/Fedora install, I guess it would be worthwhile feature to add.

0 replies

rhatdan · 2025-09-15T13:09:19Z

rhatdan
Sep 15, 2025
Maintainer

Ok this looks like a configuration issue, so moving to dicussion.

0 replies

qmarcou · 2025-09-16T15:12:15Z

qmarcou
Sep 16, 2025
Author

Just adding my user to the render group was not enough, I also have to add "--keep-groups" to get rid of the what(): can not find preferred GPU platform error when I call ramalama.

However this just gets ride of the error, but fails to use the GPU. If I set up a ramalama server, and then query the API from the host via

curl http://localhost:8087/v1/completions -d '{ 
   "model": "qwen3:4b", 
   "prompt": "Why is the sky blue?", 
   "stream": false
}'

the inference with a single CPU used a 100% and no GPU usage (monitored via gputop).
The user UID I see via htop on this llama.cpp process is not my current user UID (and hence not part of the render group, not sure if that's part of the problem)

0 replies

llama server fails to find Intel GPU and crashes #1933

Uh oh!

Uh oh!

qmarcou Jun 23, 2025

Issue Description

Steps to reproduce the issue

Describe the results you received

Describe the results you expected

ramalama info output

Upstream Latest Release

Additional environment details

Additional information

Replies: 35 comments

Uh oh!

taronaeo Jun 23, 2025 Collaborator

Uh oh!

qmarcou Jun 23, 2025 Author

Uh oh!

taronaeo Jun 23, 2025 Collaborator

Uh oh!

qmarcou Jun 23, 2025 Author

Uh oh!

taronaeo Jun 25, 2025 Collaborator

Uh oh!

Uh oh!

qmarcou Jun 27, 2025 Author

Uh oh!

Uh oh!

ericcurtin Jun 27, 2025 Maintainer

Uh oh!

Uh oh!

taronaeo Jun 28, 2025 Collaborator

Uh oh!

qmarcou Jun 30, 2025 Author

Uh oh!

0cc4m Jul 3, 2025

Uh oh!

tristan-k Jul 15, 2025

Uh oh!

ericcurtin Jul 15, 2025 Maintainer

Uh oh!

ericcurtin Jul 15, 2025 Maintainer

Uh oh!

Uh oh!

tristan-k Jul 21, 2025

Uh oh!

Uh oh!

rhatdan Jul 21, 2025 Maintainer

Uh oh!

rhatdan Jul 24, 2025 Maintainer

Uh oh!

afazekas Jul 24, 2025

Uh oh!

Uh oh!

tristan-k Aug 19, 2025

Uh oh!

rhatdan Aug 20, 2025 Maintainer

Uh oh!

qmarcou Sep 9, 2025 Author

Uh oh!

qmarcou Sep 11, 2025 Author

Uh oh!

giuseppe Sep 11, 2025 Maintainer

Uh oh!

qmarcou Sep 11, 2025 Author

Uh oh!

giuseppe Sep 11, 2025 Maintainer

Uh oh!

qmarcou Sep 11, 2025 Author

Uh oh!

qmarcou
Jun 23, 2025

taronaeo
Jun 23, 2025
Collaborator

qmarcou
Jun 23, 2025
Author

taronaeo
Jun 23, 2025
Collaborator

qmarcou
Jun 23, 2025
Author

taronaeo
Jun 25, 2025
Collaborator

qmarcou
Jun 27, 2025
Author

ericcurtin
Jun 27, 2025
Maintainer

taronaeo
Jun 28, 2025
Collaborator

qmarcou
Jun 30, 2025
Author

0cc4m
Jul 3, 2025

tristan-k
Jul 15, 2025

ericcurtin
Jul 15, 2025
Maintainer

ericcurtin
Jul 15, 2025
Maintainer

tristan-k
Jul 21, 2025

rhatdan
Jul 21, 2025
Maintainer

rhatdan
Jul 24, 2025
Maintainer

afazekas
Jul 24, 2025

tristan-k
Aug 19, 2025

rhatdan
Aug 20, 2025
Maintainer

qmarcou
Sep 9, 2025
Author

qmarcou
Sep 11, 2025
Author

giuseppe
Sep 11, 2025
Maintainer

qmarcou
Sep 11, 2025
Author

giuseppe
Sep 11, 2025
Maintainer

qmarcou
Sep 11, 2025
Author