Skip to content

Releases: beehive-lab/GPULlama3.java

GPULlama3.java 0.3.3

19 Dec 08:43
Immutable release. Only release title and notes can be modified.
27b16a0

Choose a tag to compare


📦 Installation

Maven

<dependency>
    <groupId>io.github.beehive-lab</groupId>
    <artifactId>gpu-llama3</artifactId>
    <version>0.3.3</version>
</dependency>

Gradle

implementation 'io.github.beehive-lab:gpu-llama3:0.3.3'

📖 Documentation | 🔗 Maven Central

GPULlama3.java 0.3.2

18 Dec 12:23
Immutable release. Only release title and notes can be modified.
15d8cc8

Choose a tag to compare

Model Support

  • [models] Support for IBM Granite Models 3.2, 3.3 & 4.0 with FP16 and Q8 (#92)

Other Changes

  • [docs] Update docs to use SDKMAN! and point to TornadoVM 2.2.0 (#93)
  • Add JBang catalog and local usage examples to README.md (#91)
  • Add jbang script and configuration to make easy to run (#90)

📦 Installation

Maven

<dependency>
    <groupId>io.github.beehive-lab</groupId>
    <artifactId>gpu-llama3</artifactId>
    <version>0.3.2</version>
</dependency>

Gradle

implementation 'io.github.beehive-lab:gpu-llama3:0.3.2'

📖 Documentation | 🔗 Maven Central

GPULlama3.java 0.3.1

11 Dec 17:19
Immutable release. Only release title and notes can be modified.
89fdb7b

Choose a tag to compare

Model Support

  • Add compatibility method for langchain4j and quarkus in ModelLoader (#87)

📦 Installation

Maven

<dependency>
    <groupId>io.github.beehive-lab</groupId>
    <artifactId>gpu-llama3</artifactId>
    <version>0.3.1</version>
</dependency>

Gradle

implementation 'io.github.beehive-lab:gpu-llama3:0.3.1'

📖 Documentation | 🔗 Maven Central

GPULlama3.java 0.3.0

11 Dec 12:19
Immutable release. Only release title and notes can be modified.
3722b09

Choose a tag to compare

Model Support

  • [refactor] Generalize the design of tornadovm package to support multiple new models and types for GPU exec (#62)
  • Refactor/cleanup model loaders (#58)
  • Add Support for Q8_0 Models (#59)

Bug Fixes

  • [fix] Normalization compute step for non-nvidia hardware (#84)

Other Changes

  • Update README to enhance TornadoVM performance section and clarify GP… (#85)
  • Simplify installation by replacing TornadoVM submodule with pre-built SDK (#82)
  • [FP16] Improved performance by fusing dequantize with compute in kernels: 20-30% Inference Speedup (#78)
  • [cicd] Prevent workflows from running on forks (#83)
  • [CI][packaging] Automate process of deploying a new release with Github actions (#81)
  • [Opt] Manipulation of Q8_0 tensors with Tornado ByteArrays (#79)
  • Optimization in Q8_0 loading (#74)
  • [opt] GGUF Load Optimization for tensors in TornadoVM layout (#71)
  • Add SchedulerType support to all TornadoVM layer planners and layer… (#66)
  • Weight Abstractions (#65)
  • Bug fixes in sizes and names of GridScheduler (#64)
  • Add Maven wrapper support (#56)
  • Add changes used in Devoxx Demo (#54)

📦 Installation

Maven

<dependency>
    <groupId>io.github.beehive-lab</groupId>
    <artifactId>gpu-llama3</artifactId>
    <version>0.3.0</version>
</dependency>

Gradle

implementation 'io.github.beehive-lab:gpu-llama3:0.3.0'

📖 Documentation | 🔗 Maven Central

v0.2.2

01 Oct 17:53

Choose a tag to compare

What's Changed

Full Changelog: v0.2.1...v0.2.2

v0.2.1

15 Sep 16:08

Choose a tag to compare

What's Changed

Full Changelog: v0.2.0...v0.2.1

v0.2.0

04 Sep 12:42

Choose a tag to compare

Model Support

  • Mistral – support for GGUF-format Mistral models with optimized GPU execution.
  • Qwen2.5 – GGUF-format Qwen2.5 models supported, including performance improvements for attention layers.
  • Qwen3 – compatible with GGUF-format Qwen3 models and updated integration.
  • DeepSeek-R1-Distill-Qwen-1.5B – GGUF-format DeepSeek distilled models supported for efficient inference.
  • Phi-3 – full support for GGUF-format Microsoft Phi-3 models for high-performance workloads.

What's Changed

New Contributors

Full Changelog: v0.1.0-beta...v0.2.0

v0.1.0-beta

30 May 07:01
0c9a05a

Choose a tag to compare

  • Llama 3 model compatibility - Full support for Llama 3.0, 3.1, and 3.2 models
  • GGUF format support - Native handling of GGUF model files
  • Support for FP16 models for reduced memory usage and faster computation
  • GPU Acceleration on NVIDIA GPUs using both OpenCL and PTX backends
  • [Experimental] Support for Apple Silicon (M1/M2/M3) via OpenCL (subject to hardware/compiler limitations)
  • [Experimental] Initial support for Q8 and Q4 quantized models, using runtime dequantization to FP16