GitHub - FastFlowLM/FastFlowLM: Run LLMs on AMD Ryzen™ AI NPUs in minutes. Just like Ollama - but purpose-built and deeply optimized for the AMD NPUs.

⚡ FastFlowLM (FLM) — Unlock Ryzen™ AI NPUs

Run large language models — now with Vision, Audio, Embedding and MoE support — on AMD Ryzen™ AI NPUs in minutes.
No GPU required. Faster and over 10× more power-efficient. Supports context lengths up to 256k tokens. Ultra-Lightweight (16 MB). Installs within 20 seconds.

📦 The only out-of-box, NPU-first runtime built exclusively for Ryzen™ AI.
🤝 Think Ollama — but deeply optimized for NPUs.
✨ From Idle Silicon to Instant Power — FastFlowLM Makes Ryzen™ AI Shine.

FastFlowLM (FLM) supports all Ryzen™ AI Series chips with XDNA2 NPUs (Strix, Strix Halo, and Kraken).

🔗 Quick Links

🔽 Download | 📊 Benchmarks | 📦 Model List

📖 Docs | 📺 Demos | 🧪 Test Drive | 💬 Discord

🚀 Quick Start

A packaged FLM Windows installer is available here: flm-setup.exe. For more details, see the release notes.

📺 Watch the quick start video

Important

⚠️ Ensure NPU driver verison is >= 32.0.203.311 (check via Task Manager→Performance→NPU or Device Manager).
⚙️ Tip:

RECOMMENDED: Try running Windows Update or Driver Download.
Official AMD Install Doc (AMD account required).
Unofficial forum downloads (CAUTION, we no do not hold responsible for what you download here).

After installation, open PowerShell (Win + X → I). To run a model in terminal (CLI Mode):

flm run llama3.2:1b

Notes:

Internet access to HuggingFace is required to download the optimized model kernels.

Sometimes downloads from HuggingFace may get corrupted. If this happens, run flm pull <model_tag> --force (e.g. flm pull llama3.2:1b --force) to re-download and fix them.

By default, models are stored in: C:\Users\<USER>\Documents\flm\models\

During installation, you can select a different base folder (e.g., if you choose C:\Users\<USER>\flm, models will be saved under C:\Users\<USER>\flm\models\).

⚠️ If HuggingFace is not accessible in your region, manually download the model (check this issue) and place it in the chosen directory.

🎉🚀 FastFlowLM (FLM) is ready — your NPU is unlocked and you can start chatting with models right away!

Open Task Manager (Ctrl + Shift + Esc). Go to the Performance tab → click NPU to monitor usage.

⚡ Quick Tips:

Use /verbose during a session to turn on performance reporting (toggle off with /verbose again).

Type /bye to exit a conversation.

Run flm list in PowerShell to show all available models.

To start the local server (Server Mode):

flm serve llama3.2:1b

The model tag (e.g., llama3.2:1b) sets the initial model, which is optional. If another model is requested, FastFlowLM will automatically switch to it. Local server is on port 52625 (default).

📰 In the News

10/01/2025 🎉 FLM was integrated into AMD's Lemonade Server 🍋. Watch this short demo about using FLM in Lemonade.

🧠 Local AI on NPU

FLM makes it easy to run cutting-edge LLMs (and now VLMs) locally with:

⚡ Fast and low power
🧰 Simple CLI and API (REST and OpenAI API)
🔐 Fully private and offline

No model rewrites, no tuning — it just works.

✅ Highlights

Runs fully on AMD Ryzen™ AI NPU — no GPU or CPU load
Lightweight runtime (16 MB) — installs within 20 seconds, easy to integrate
Developer-first flow — like Ollama, but optimized for NPU
Support for long context windows — up to 256k tokens (e.g., Qwen3-4B-Thinking-2507)
No low-level tuning required — You focus on your app, we handle the rest

📄 License

All orchestration code and CLI tools are open-source under the MIT License.
NPU-accelerated kernels are proprietary binaries, free for commercial use up to USD 10 million in annual company revenue.
Companies exceeding this threshold (USD 10 million) must obtain a commercial license. See LICENSE_BINARY.txt and TERMS.md for full details.
Free-tier users: Please acknowledge FastFlowLM in your README/project page (or product) as follows:
```
Powered by [FastFlowLM](https://github.com/FastFlowLM/FastFlowLM)
```

For commercial licensing inquiries, email us: [email protected]

💬 Have feedback/issues or want early access to our new releases? Open an issue or Join our Discord community

🙏 Acknowledgements

Powered by the advanced AMD Ryzen™ AI NPU architecture
Inspired by the widely adopted llama.cpp and Ollama
Tokenization accelerated with MLC-ai/tokenizers-cpp
Chat formatting via Google/minja
Low-level kernels optimized using the powerful IRON+AIE-MLIR

Name		Name	Last commit message	Last commit date
Latest commit History 654 Commits
assets		assets
docs		docs
src		src
.gitignore		.gitignore
LICENSE_BINARY.txt		LICENSE_BINARY.txt
LICENSE_RUNTIME.txt		LICENSE_RUNTIME.txt
README.md		README.md
TERMS.md		TERMS.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

⚡ FastFlowLM (FLM) — Unlock Ryzen™ AI NPUs

🔗 Quick Links

🚀 Quick Start

📰 In the News

🧠 Local AI on NPU

✅ Highlights

📄 License

🙏 Acknowledgements

About

Uh oh!

Releases 30

Packages

Contributors 4

Languages

License

FastFlowLM/FastFlowLM

Folders and files

Latest commit

History

Repository files navigation

⚡ FastFlowLM (FLM) — Unlock Ryzen™ AI NPUs

🔗 Quick Links

🚀 Quick Start

📰 In the News

🧠 Local AI on NPU

✅ Highlights

📄 License

🙏 Acknowledgements

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 30

Packages 0

Contributors 4

Languages

Packages