Skip to content

Run LLMs on AMD Ryzen™ AI NPUs in minutes. Just like Ollama - but purpose-built and deeply optimized for the AMD NPUs.

License

Notifications You must be signed in to change notification settings

FastFlowLM/FastFlowLM

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

FastFlowLM Logo

⚡ FastFlowLM (FLM) — Unlock Ryzen™ AI NPUs

Run large language models — now with Vision, Audio, Embedding and MoE support — on AMD Ryzen™ AI NPUs in minutes.
No GPU required. Faster and over 10× more power-efficient. Supports context lengths up to 256k tokens. Ultra-Lightweight (16 MB). Installs within 20 seconds.

📦 The only out-of-box, NPU-first runtime built exclusively for Ryzen™ AI.
🤝 Think Ollama — but deeply optimized for NPUs.
From Idle Silicon to Instant Power — FastFlowLM Makes Ryzen™ AI Shine.

FastFlowLM (FLM) supports all Ryzen™ AI Series chips with XDNA2 NPUs (Strix, Strix Halo, and Kraken).


🔗 Quick Links

🔽 Download | 📊 Benchmarks | 📦 Model List

📖 Docs | 📺 Demos | 🧪 Test Drive | 💬 Discord


🚀 Quick Start

A packaged FLM Windows installer is available here: flm-setup.exe. For more details, see the release notes.

📺 Watch the quick start video

Important

⚠️ Ensure NPU driver verison is >= 32.0.203.311 (check via Task Manager→Performance→NPU or Device Manager).
⚙️ Tip:

After installation, open PowerShell (Win + X → I). To run a model in terminal (CLI Mode):

flm run llama3.2:1b

Notes:

  • Internet access to HuggingFace is required to download the optimized model kernels.
  • Sometimes downloads from HuggingFace may get corrupted. If this happens, run flm pull <model_tag> --force (e.g. flm pull llama3.2:1b --force) to re-download and fix them.
  • By default, models are stored in: C:\Users\<USER>\Documents\flm\models\
  • During installation, you can select a different base folder (e.g., if you choose C:\Users\<USER>\flm, models will be saved under C:\Users\<USER>\flm\models\).
  • ⚠️ If HuggingFace is not accessible in your region, manually download the model (check this issue) and place it in the chosen directory.

🎉🚀 FastFlowLM (FLM) is ready — your NPU is unlocked and you can start chatting with models right away!

Open Task Manager (Ctrl + Shift + Esc). Go to the Performance tab → click NPU to monitor usage.

⚡ Quick Tips:

  • Use /verbose during a session to turn on performance reporting (toggle off with /verbose again).
  • Type /bye to exit a conversation.
  • Run flm list in PowerShell to show all available models.

To start the local server (Server Mode):

flm serve llama3.2:1b

The model tag (e.g., llama3.2:1b) sets the initial model, which is optional. If another model is requested, FastFlowLM will automatically switch to it. Local server is on port 52625 (default).

FastFlowLM Docs


📰 In the News


🧠 Local AI on NPU

FLM makes it easy to run cutting-edge LLMs (and now VLMs) locally with:

  • ⚡ Fast and low power
  • 🧰 Simple CLI and API (REST and OpenAI API)
  • 🔐 Fully private and offline

No model rewrites, no tuning — it just works.


✅ Highlights

  • Runs fully on AMD Ryzen™ AI NPU — no GPU or CPU load
  • Lightweight runtime (16 MB) — installs within 20 seconds, easy to integrate
  • Developer-first flow — like Ollama, but optimized for NPU
  • Support for long context windows — up to 256k tokens (e.g., Qwen3-4B-Thinking-2507)
  • No low-level tuning required — You focus on your app, we handle the rest

📄 License

  • All orchestration code and CLI tools are open-source under the MIT License.
  • NPU-accelerated kernels are proprietary binaries, free for commercial use up to USD 10 million in annual company revenue.
  • Companies exceeding this threshold (USD 10 million) must obtain a commercial license. See LICENSE_BINARY.txt and TERMS.md for full details.
  • Free-tier users: Please acknowledge FastFlowLM in your README/project page (or product) as follows:
    Powered by [FastFlowLM](https://github.com/FastFlowLM/FastFlowLM)
    

For commercial licensing inquiries, email us: [email protected]


💬 Have feedback/issues or want early access to our new releases? Open an issue or Join our Discord community


🙏 Acknowledgements