Releases: beehive-lab/GPULlama3.java
Releases · beehive-lab/GPULlama3.java
GPULlama3.java 0.3.3
Immutable
release. Only release title and notes can be modified.
📦 Installation
Maven
<dependency>
<groupId>io.github.beehive-lab</groupId>
<artifactId>gpu-llama3</artifactId>
<version>0.3.3</version>
</dependency>Gradle
implementation 'io.github.beehive-lab:gpu-llama3:0.3.3'📖 Documentation | 🔗 Maven Central
GPULlama3.java 0.3.2
Immutable
release. Only release title and notes can be modified.
Model Support
- [models] Support for IBM Granite Models 3.2, 3.3 & 4.0 with FP16 and Q8 (#92)
Other Changes
- [docs] Update docs to use SDKMAN! and point to TornadoVM 2.2.0 (#93)
- Add JBang catalog and local usage examples to README.md (#91)
- Add
jbangscript and configuration to make easy to run (#90)
📦 Installation
Maven
<dependency>
<groupId>io.github.beehive-lab</groupId>
<artifactId>gpu-llama3</artifactId>
<version>0.3.2</version>
</dependency>Gradle
implementation 'io.github.beehive-lab:gpu-llama3:0.3.2'📖 Documentation | 🔗 Maven Central
GPULlama3.java 0.3.1
Immutable
release. Only release title and notes can be modified.
Model Support
- Add compatibility method for langchain4j and quarkus in ModelLoader (#87)
📦 Installation
Maven
<dependency>
<groupId>io.github.beehive-lab</groupId>
<artifactId>gpu-llama3</artifactId>
<version>0.3.1</version>
</dependency>Gradle
implementation 'io.github.beehive-lab:gpu-llama3:0.3.1'📖 Documentation | 🔗 Maven Central
GPULlama3.java 0.3.0
Immutable
release. Only release title and notes can be modified.
Model Support
- [refactor] Generalize the design of
tornadovmpackage to support multiple new models and types for GPU exec (#62) - Refactor/cleanup model loaders (#58)
- Add Support for Q8_0 Models (#59)
Bug Fixes
- [fix] Normalization compute step for non-nvidia hardware (#84)
Other Changes
- Update README to enhance TornadoVM performance section and clarify GP… (#85)
- Simplify installation by replacing TornadoVM submodule with pre-built SDK (#82)
- [FP16] Improved performance by fusing dequantize with compute in kernels: 20-30% Inference Speedup (#78)
- [cicd] Prevent workflows from running on forks (#83)
- [CI][packaging] Automate process of deploying a new release with Github actions (#81)
- [Opt] Manipulation of Q8_0 tensors with Tornado
ByteArrays (#79) - Optimization in Q8_0 loading (#74)
- [opt] GGUF Load Optimization for tensors in TornadoVM layout (#71)
- Add
SchedulerTypesupport to all TornadoVM layer planners and layer… (#66) - Weight Abstractions (#65)
- Bug fixes in sizes and names of GridScheduler (#64)
- Add Maven wrapper support (#56)
- Add changes used in Devoxx Demo (#54)
📦 Installation
Maven
<dependency>
<groupId>io.github.beehive-lab</groupId>
<artifactId>gpu-llama3</artifactId>
<version>0.3.0</version>
</dependency>Gradle
implementation 'io.github.beehive-lab:gpu-llama3:0.3.0'📖 Documentation | 🔗 Maven Central
v0.2.2
What's Changed
- Fully working support for LangChain4j
- LlamaApp cleanup by @orionpapadakis in #51
- Fix execution path control by @orionpapadakis in #52
- Add support for encoding ordinary text in Qwen3Tokenizer and update Q… by @mikepapadim in #53
Full Changelog: v0.2.1...v0.2.2
v0.2.1
What's Changed
- Minor cleanup by @mikepapadim in #47
- Add
useTornadovmflag to model loader to handle Builder option in Langchain4j by @mikepapadim in #50
Full Changelog: v0.2.0...v0.2.1
v0.2.0
Model Support
- Mistral – support for GGUF-format Mistral models with optimized GPU execution.
- Qwen2.5 – GGUF-format Qwen2.5 models supported, including performance improvements for attention layers.
- Qwen3 – compatible with GGUF-format Qwen3 models and updated integration.
- DeepSeek-R1-Distill-Qwen-1.5B – GGUF-format DeepSeek distilled models supported for efficient inference.
- Phi-3 – full support for GGUF-format Microsoft Phi-3 models for high-performance workloads.
What's Changed
- [refactor] Renamed aux package to resolve Windows issue by @stratika in #11
- Windows support for GPULlama3.java by @stratika in #12
- [API] Update TornadoVM API to use latest warmup features by @mikepapadim in #13
- [model] Add support for Mistral models by @orionpapadakis in #17
- Cleanups post Mistral Integration by @mikepapadim in #27
- Add a Docker section to README with available images and usage examples by @mikepapadim in #28
- Refactor TornadoVMMasterPlan to simplify scheduling decision for non-Nvidia HW and Mistral Models by @mikepapadim in #32
- File not found error handling in loadModel method in GGUF.java by @dhruvarayasam in #34
- Update README for clarity by @mikepapadim in #36
- [models] Support for Qwen3 models by @orionpapadakis in #37
- [models][phi-3] Support for Microsoft's Phi-3 models by @mikepapadim in #38
- Reorganize package structure and update imports to use `org.beehive.g… by @mikepapadim in #42
- Update README.md by @kotselidis in #44
- [models][deepseek][qwen2.5] Add support for Qwen2.5 and Deepseek-Distilled-Qwen models by @orionpapadakis in #40
- Improve attention performance for qwen2.5 & deepseek by @orionpapadakis in #46
New Contributors
- @orionpapadakis made their first contribution in #17
- @dhruvarayasam made their first contribution in #34
Full Changelog: v0.1.0-beta...v0.2.0
v0.1.0-beta
- Llama 3 model compatibility - Full support for Llama 3.0, 3.1, and 3.2 models
- GGUF format support - Native handling of GGUF model files
- Support for FP16 models for reduced memory usage and faster computation
- GPU Acceleration on NVIDIA GPUs using both OpenCL and PTX backends
- [Experimental] Support for Apple Silicon (M1/M2/M3) via OpenCL (subject to hardware/compiler limitations)
- [Experimental] Initial support for Q8 and Q4 quantized models, using runtime dequantization to FP16