Merge pull request #2093 from rhatdan/VERSION

mikebonnet · web-flow · commit e3c17b088980 · 2025-10-31T11:05:18.000-07:00
Bump to v0.14.0
diff --git a/docs/ramalama-bench.1.md b/docs/ramalama-bench.1.md
@@ -59,7 +59,7 @@ OCI container image to run with specified AI model. RamaLama defaults to using
 images based on the accelerator it discovers. For example:
 `quay.io/ramalama/ramalama`. See the table below for all default images.
 The default image tag is based on the minor version of the RamaLama package.
-Version 0.13.0 of RamaLama pulls an image with a `:0.12` tag from the quay.io/ramalama OCI repository. The --image option overrides this default.
+Version 0.14.0 of RamaLama pulls an image with a `:0.14` tag from the quay.io/ramalama OCI repository. The --image option overrides this default.
 
 The default can be overridden in the ramalama.conf file or via the
 RAMALAMA_IMAGE environment variable. `export RAMALAMA_IMAGE=quay.io/ramalama/aiimage:1.2` tells
diff --git a/docs/ramalama-perplexity.1.md b/docs/ramalama-perplexity.1.md
@@ -62,7 +62,7 @@ OCI container image to run with specified AI model. RamaLama defaults to using
 images based on the accelerator it discovers. For example:
 `quay.io/ramalama/ramalama`. See the table below for all default images.
 The default image tag is based on the minor version of the RamaLama package.
-Version 0.13.0 of RamaLama pulls an image with a `:0.12` tag from the quay.io/ramalama OCI repository. The --image option overrides this default.
+Version 0.14.0 of RamaLama pulls an image with a `:0.14` tag from the quay.io/ramalama OCI repository. The --image option overrides this default.
 
 The default can be overridden in the ramalama.conf file or via the
 RAMALAMA_IMAGE environment variable. `export RAMALAMA_IMAGE=quay.io/ramalama/aiimage:1.2` tells
diff --git a/docs/ramalama-rag.1.md b/docs/ramalama-rag.1.md
@@ -50,7 +50,7 @@ OCI container image to run with specified AI model. RamaLama defaults to using
 images based on the accelerator it discovers. For example:
 `quay.io/ramalama/ramalama-rag`. See the table below for all default images.
 The default image tag is based on the minor version of the RamaLama package.
-Version 0.13.0 of RamaLama pulls an image with a `:0.12` tag from the quay.io/ramalama OCI repository. The --image option overrides this default.
+Version 0.14.0 of RamaLama pulls an image with a `:0.14` tag from the quay.io/ramalama OCI repository. The --image option overrides this default.
 
 The default can be overridden in the ramalama.conf file or via the
 RAMALAMA_IMAGE environment variable. `export RAMALAMA_IMAGE=quay.io/ramalama/aiimage:1.2` tells
diff --git a/docs/ramalama-run.1.md b/docs/ramalama-run.1.md
@@ -73,7 +73,7 @@ OCI container image to run with specified AI model. RamaLama defaults to using
 images based on the accelerator it discovers. For example:
 `quay.io/ramalama/ramalama`. See the table below for all default images.
 The default image tag is based on the minor version of the RamaLama package.
-Version 0.13.0 of RamaLama pulls an image with a `:0.12` tag from the quay.io/ramalama OCI repository. The --image option overrides this default.
+Version 0.14.0 of RamaLama pulls an image with a `:0.14` tag from the quay.io/ramalama OCI repository. The --image option overrides this default.
 
 The default can be overridden in the ramalama.conf file or via the
 RAMALAMA_IMAGE environment variable. `export RAMALAMA_IMAGE=quay.io/ramalama/aiimage:1.2` tells
diff --git a/docs/ramalama-serve.1.md b/docs/ramalama-serve.1.md
@@ -120,7 +120,7 @@ OCI container image to run with specified AI model. RamaLama defaults to using
 images based on the accelerator it discovers. For example:
 `quay.io/ramalama/ramalama`. See the table above for all default images.
 The default image tag is based on the minor version of the RamaLama package.
-Version 0.13.0 of RamaLama pulls an image with a `:0.12` tag from the quay.io/ramalama OCI repository. The --image option overrides this default.
+Version 0.14.0 of RamaLama pulls an image with a `:0.14` tag from the quay.io/ramalama OCI repository. The --image option overrides this default.
 
 The default can be overridden in the ramalama.conf file or via the
 RAMALAMA_IMAGE environment variable. `export RAMALAMA_IMAGE=quay.io/ramalama/aiimage:1.2` tells
diff --git a/docs/ramalama-version.1.md b/docs/ramalama-version.1.md
@@ -18,9 +18,9 @@ Print usage message
 
 ```
 $ ramalama version
-ramalama version 0.13.0
+ramalama version 0.14.0
 $ ramalama -q version
-0.13.0
+0.14.0
 >
 ```
 ## SEE ALSO
diff --git a/docsite/docs/commands/ramalama/bench.mdx b/docsite/docs/commands/ramalama/bench.mdx
@@ -63,7 +63,7 @@ OCI container image to run with specified AI model. RamaLama defaults to using
 images based on the accelerator it discovers. For example:
 `quay.io/ramalama/ramalama`. See the table below for all default images.
 The default image tag is based on the minor version of the RamaLama package.
-Version 0.13.0 of RamaLama pulls an image with a `:0.12` tag from the quay.io/ramalama OCI repository. The --image option overrides this default.
+Version 0.14.0 of RamaLama pulls an image with a `:0.14` tag from the quay.io/ramalama OCI repository. The --image option overrides this default.
 
 The default can be overridden in the ramalama.conf file or via the
 RAMALAMA_IMAGE environment variable. `export RAMALAMA_IMAGE=quay.io/ramalama/aiimage:1.2` tells
diff --git a/docsite/docs/commands/ramalama/convert.mdx b/docsite/docs/commands/ramalama/convert.mdx
@@ -25,14 +25,26 @@ The model can be from RamaLama model storage in Huggingface, Ollama, or a local
 
 Convert Safetensor models into a GGUF with the specified quantization format. To learn more about model quantization, read llama.cpp documentation:
 https://github.com/ggml-org/llama.cpp/blob/master/tools/quantize/README.md
-Default: Q4_K_M
 
 #### **--help**, **-h**
 Print usage message
 
+#### **--image**=IMAGE
+Image to use for model quantization when converting to GGUF format (when the `--gguf` option has been specified). The image must have the
+`llama-quantize` executable available on the `PATH`. Defaults to the appropriate `ramalama` image based on available accelerators. If no
+accelerators are available, the current `quay.io/ramalama/ramalama` image will be used.
+
 #### **--network**=*none*
 sets the configuration for network namespaces when handling RUN instructions
 
+#### **--pull**=*policy*
+Pull image policy. The default is **missing**.
+
+#### **--rag-image**=IMAGE
+Image to use when converting to GGUF format (when then `--gguf` option has been specified). The image must have the `convert_hf_to_gguf.py` script
+executable and available in the `PATH`. The script is available from the `llama.cpp` GitHub repo. Defaults to the current
+`quay.io/ramalama/ramalama-rag` image.
+
 #### **--type**=*raw* | *car*
 
 type of OCI Model Image to convert.
@@ -59,7 +71,7 @@ Successfully tagged quay.io/rhatdan/tiny:latest
 
 Generate and run an oci model with a quantized GGUF converted from Safetensors.
 ```bash
-$ ramalama --image quay.io/ramalama/ramalama-rag convert --gguf Q4_K_M hf://ibm-granite/granite-3.2-2b-instruct oci://quay.io/kugupta/granite-3.2-q4-k-m:latest
+$ ramalama convert --gguf Q4_K_M hf://ibm-granite/granite-3.2-2b-instruct oci://quay.io/kugupta/granite-3.2-q4-k-m:latest
 Converting /Users/kugupta/.local/share/ramalama/models/huggingface/ibm-granite/granite-3.2-2b-instruct to quay.io/kugupta/granite-3.2-q4-k-m:latest...
 Building quay.io/kugupta/granite-3.2-q4-k-m:latest...
 $ ramalama run oci://quay.io/kugupta/granite-3.2-q4-k-m:latest
diff --git a/docsite/docs/commands/ramalama/info.mdx b/docsite/docs/commands/ramalama/info.mdx
@@ -20,14 +20,24 @@ show this help message and exit
 
 ## FIELDS
 
+The `Accelerator` field indicates the accelerator type for the machine.
+
+The `Config` field shows the list of paths to RamaLama configuration files used. 
+
 The `Engine` field indicates the OCI container engine used to launch the container in which to run the AI Model
 
 The `Image` field indicates the default container image in which to run the AI Model
 
-The `Runtime` field indicates which backend engine is used to execute the AI model:
+The `Inference` field lists the currently used inference engine as well as a list of available engine specification and schema files used for model inference. 
+For example:
+
+    - `llama.cpp`
+    - `vllm`
+    - `mlx`
+
+The `Selinux` field indicates if SELinux is activated or not.
 
-    - `llama.cpp`: Uses the llama.cpp library for model execution
-    - `vllm`: Uses the vLLM library for model execution
+The `Shortnames` field shows the used list of configuration files specifying AI Model short names as well as the merged list of shortnames.
 
 The `Store` field indicates the directory path where RamaLama stores its persistent data, including downloaded models, configuration files, and cached data. By default, this is located in the user's local share directory.
 
diff --git a/docsite/docs/commands/ramalama/perplexity.mdx b/docsite/docs/commands/ramalama/perplexity.mdx
@@ -66,7 +66,7 @@ OCI container image to run with specified AI model. RamaLama defaults to using
 images based on the accelerator it discovers. For example:
 `quay.io/ramalama/ramalama`. See the table below for all default images.
 The default image tag is based on the minor version of the RamaLama package.
-Version 0.13.0 of RamaLama pulls an image with a `:0.12` tag from the quay.io/ramalama OCI repository. The --image option overrides this default.
+Version 0.14.0 of RamaLama pulls an image with a `:0.14` tag from the quay.io/ramalama OCI repository. The --image option overrides this default.
 
 The default can be overridden in the ramalama.conf file or via the
 RAMALAMA_IMAGE environment variable. `export RAMALAMA_IMAGE=quay.io/ramalama/aiimage:1.2` tells
diff --git a/docsite/docs/commands/ramalama/rag.mdx b/docsite/docs/commands/ramalama/rag.mdx
@@ -56,7 +56,7 @@ OCI container image to run with specified AI model. RamaLama defaults to using
 images based on the accelerator it discovers. For example:
 `quay.io/ramalama/ramalama-rag`. See the table below for all default images.
 The default image tag is based on the minor version of the RamaLama package.
-Version 0.13.0 of RamaLama pulls an image with a `:0.12` tag from the quay.io/ramalama OCI repository. The --image option overrides this default.
+Version 0.14.0 of RamaLama pulls an image with a `:0.14` tag from the quay.io/ramalama OCI repository. The --image option overrides this default.
 
 The default can be overridden in the ramalama.conf file or via the
 RAMALAMA_IMAGE environment variable. `export RAMALAMA_IMAGE=quay.io/ramalama/aiimage:1.2` tells
diff --git a/docsite/docs/commands/ramalama/run.mdx b/docsite/docs/commands/ramalama/run.mdx
@@ -77,7 +77,7 @@ OCI container image to run with specified AI model. RamaLama defaults to using
 images based on the accelerator it discovers. For example:
 `quay.io/ramalama/ramalama`. See the table below for all default images.
 The default image tag is based on the minor version of the RamaLama package.
-Version 0.13.0 of RamaLama pulls an image with a `:0.12` tag from the quay.io/ramalama OCI repository. The --image option overrides this default.
+Version 0.14.0 of RamaLama pulls an image with a `:0.14` tag from the quay.io/ramalama OCI repository. The --image option overrides this default.
 
 The default can be overridden in the ramalama.conf file or via the
 RAMALAMA_IMAGE environment variable. `export RAMALAMA_IMAGE=quay.io/ramalama/aiimage:1.2` tells
@@ -133,6 +133,9 @@ use. Using this option RamaLama will override these defaults.
 On Nvidia based GPU systems, RamaLama defaults to using the
 `nvidia-container-runtime`. Use this option to override this selection.
 
+#### **--port**, **-p**=*port*
+Port for AI Model server to listen on (default: 8080)
+
 #### **--prefix**
 Prefix for the user prompt (default: 🦭 > )
 
@@ -165,6 +168,10 @@ Pull image policy. The default is **missing**.
 #### **--rag**=
 Specify path to Retrieval-Augmented Generation (RAG) database or an OCI Image containing a RAG database
 
+#### **--rag-image**=
+The image to use to process the RAG database specified by the `--rag` option. The image must contain the `/usr/bin/rag_framework` executable, which
+will create a proxy which embellishes client requests with RAG data before passing them on to the LLM, and returns the responses.
+
 #### **--runtime-args**="*args*"
 Add *args* to the runtime (llama.cpp or vllm) invocation.
 
@@ -214,6 +221,12 @@ ramalama run --keepalive 10m file:///tmp/mymodel
 >
 ```
 
+Run command with a custom port to allow multiple models running simultaneously
+```text
+ramalama run --port 8081 granite
+>
+```
+
 ```text
 ramalama run merlinite "when is the summer solstice"
 The summer solstice, which is the longest day of the year, will happen on June ...
diff --git a/docsite/docs/commands/ramalama/serve.mdx b/docsite/docs/commands/ramalama/serve.mdx
@@ -122,7 +122,7 @@ OCI container image to run with specified AI model. RamaLama defaults to using
 images based on the accelerator it discovers. For example:
 `quay.io/ramalama/ramalama`. See the table above for all default images.
 The default image tag is based on the minor version of the RamaLama package.
-Version 0.13.0 of RamaLama pulls an image with a `:0.12` tag from the quay.io/ramalama OCI repository. The --image option overrides this default.
+Version 0.14.0 of RamaLama pulls an image with a `:0.14` tag from the quay.io/ramalama OCI repository. The --image option overrides this default.
 
 The default can be overridden in the ramalama.conf file or via the
 RAMALAMA_IMAGE environment variable. `export RAMALAMA_IMAGE=quay.io/ramalama/aiimage:1.2` tells
@@ -218,6 +218,10 @@ Specify path to Retrieval-Augmented Generation (RAG) database or an OCI Image co
  RAG support requires AI Models be run within containers, --nocontainer not supported. Docker does not support image mounting, meaning Podman support required.
 :::
 
+#### **--rag-image**=
+The image to use to process the RAG database specified by the `--rag` option. The image must contain the `/usr/bin/rag_framework` executable, which
+will create a proxy which embellishes client requests with RAG data before passing them on to the LLM, and returns the responses.
+
 #### **--runtime-args**="*args*"
 Add *args* to the runtime (llama.cpp or vllm) invocation.
 
diff --git a/docsite/docs/commands/ramalama/version.mdx b/docsite/docs/commands/ramalama/version.mdx
@@ -22,9 +22,9 @@ Print usage message
 
 ```bash
 $ ramalama version
-ramalama version 0.13.0
+ramalama version 0.14.0
 $ ramalama -q version
-0.13.0
+0.14.0
 >
 ```
 ## See Also
diff --git a/docsite/docs/configuration/conf.mdx b/docsite/docs/configuration/conf.mdx
@@ -100,6 +100,11 @@ Run RamaLama using the specified container engine.
 Valid options are: Podman and Docker
 This field can be overridden by the RAMALAMA_CONTAINER_ENGINE environment variable.
 
+**#gguf_quantization_mode**="Q4_K_M"
+
+The quantization mode used when creating OCI formatted AI Models.
+Available options: Q2_K, Q3_K_S, Q3_K_M, Q3_K_L, Q4_0, Q4_K_S, Q4_K_M, Q5_0, Q5_K_S, Q5_K_M, Q6_K, Q8_0.
+
 **host**="0.0.0.0"
 
 IP address for llama.cpp to listen on.
diff --git a/ramalama/version.py b/ramalama/version.py
@@ -2,7 +2,7 @@
 
 
 def version():
-    return "0.13.0"
+    return "0.14.0"
 
 
 def print_version(args):
diff --git a/rpm/ramalama.spec b/rpm/ramalama.spec
@@ -1,7 +1,7 @@
 %global pypi_name ramalama
 %global forgeurl  https://github.com/containers/%{pypi_name}
 # see ramalama/version.py
-%global version0  0.13.0
+%global version0  0.14.0
 %forgemeta
 
 %global summary   Command line tool for working with AI LLM models
@@ -36,7 +36,7 @@ BuildRequires:    python3-pytest
 BuildRequires:    python3-jinja2
 
 Provides: python3-ramalama = %{version}-%{release}
-Obsoletes: python3-ramalama < 0.13.0-1
+Obsoletes: python3-ramalama < %{version0}-1
 
 Requires: podman
 Requires: python3-jsonschema