Local and arbitrary model support #9619
Replies: 30 comments 30 replies
-
|
Answers to the Questions part: I am not sure I have a good answer for number 1 and 3 but 2 it's extremely important to me that local model is truly local that is likely the point of why I am using a local model to start with for that task. I haven't been using warp for a long time because it wasn't open source so I do not have many thoughts on what parts of the UI I want at this time for local models, the little I have done with like Claude cli and the information that the UI provides for that is really nice and to have something like that for local models would be cool but it might depend on what harness people are using idk. |
Beta Was this translation helpful? Give feedback.
-
|
No server transit for any requests |
Beta Was this translation helpful? Give feedback.
-
|
Warp looks really cool, but the fact it only worked with cloud models always was a deal breaker for me. I would love to have full local AI support for not only for coding, but for the terminal agent when reacting to commands. |
Beta Was this translation helpful? Give feedback.
-
|
Thanks for opening up some discussion! I think for your options, #1 is most appealing, but yes, more work. I'd like to think it could also help your architecture long term by relying less on scaling server side components along with client components. #2 is a bit hacky but could be quick. I connect to my Hermes agent using OpenWebUI because it is nicer than the raw terminal client - I'd connect to it through Warp if it were an option, but that doesn't really support Warp being a standalone tool. #3 could be most practical because it would be fully local, not require a second tool, and for most local models, not being feature-complete would be OK. The smaller context windows mean fewer turns, fewer tools available, etc. But my use case in Warp would mostly be "uh help me remember how this command is used" not "build out an entire ansible deployment for a lab". #4 is probably not worth doing - if someone wants to use a local model, its because they want it local. I'd rank them - 1, 3, 2, 4 For the questions:
|
Beta Was this translation helpful? Give feedback.
-
|
My vote is strongly for option 1. If Warp supports local or arbitrary models, I think it should mean truly local execution with no server transit. I understand that porting the harness to the client and open sourcing it is the most work, but it seems like the right long term architecture for privacy, offline use, trust, and extensibility. Thanks! |
Beta Was this translation helpful? Give feedback.
-
|
hey got a warp fork of my own ,trying to get the ollama support work repo:https://github.com/crazygamerZ783/warp-ollama |
Beta Was this translation helpful? Give feedback.
-
|
Honestly, I want to use DeepSeek V4 Flash inside Warp — it’s cheap, and it allows me to interact with the terminal using natural language. |
Beta Was this translation helpful? Give feedback.
-
|
Local models in Ollama or similar should be configurable as sources within Warp. Once available, you should be able to select a model during a session, either manually or by directing Warp to use it automatically based on the task or preference. |
Beta Was this translation helpful? Give feedback.
-
|
6666 |
Beta Was this translation helpful? Give feedback.
-
|
There are already a handful of "local warp server" implementations on your PR list , and on the wild forking from this repo. People just want to be able to use a software that they really like (warp) without going through something that they don't need (your servers). we might end up with some opensource spin-off leading this if you don't just release a minimalist opensource server that simply allows people to use warp with a openai-compatible upstream. In the future you can add something feature-rich and supporting a bunch of stuff... but for now people just want to use warp and remote models without touching someone else servers. |
Beta Was this translation helpful? Give feedback.
-
|
Model selection should be allowed to be
|
Beta Was this translation helpful? Give feedback.
-
|
All this feedback makes sense. We will have a proposed solution here shortly. |
Beta Was this translation helpful? Give feedback.
-
|
I am mostly interested as I want to use one source of models (openrouter/GLM Coding Plan) for it. |
Beta Was this translation helpful? Give feedback.
-
|
Warp looks really cool, but the fact it only worked with cloud models always was a deal breaker for me. I would love to have full local AI support for not only for coding, but for the terminal agent when reacting to commands. |
Beta Was this translation helpful? Give feedback.
-
|
There are already a handful of "local warp server" implementations on your PR list , and on the wild forking from this repo. People just want to be able to use a software that they really like (warp) without going through something that they don't need (your servers). we might end up with some opensource spin-off leading this if you don't just release a minimalist opensource server that simply allows people to use warp with a openai-compatible upstream. In the future you can add something feature-rich and supporting a bunch of stuff... but for now people just want to use warp and remote models without touching someone else servers. |
Beta Was this translation helpful? Give feedback.
-
|
Adding my two cents here for what it's worth: |
Beta Was this translation helpful? Give feedback.
-
I basically wouldn't change anything about it...I like the folder navigation, the agent tool belt, the ui is clear and appealing...the fact that at each session it can auto-detect the user intent, be it a terminal command or a natural language request to the agent. Probably too many sidebars. Also local models should be truly local, but let the user decide, and keep the account signin/signup if anyone wants to tinker with what warp offers as its models. Just don't paywall features pls |
Beta Was this translation helpful? Give feedback.
-
|
Thanks for opening this up for discussion. I wanted to share my perspective as a user, as my requirements for local LLM support are driven by privacy , Corporate policy and workflow needs. 1. On the "harness" vs. ACP client:For my workflow, the specific protocol matters less than the capability. As long as the implementation (whether it's an ACP client or a lite harness) can seamlessly pull in the context I need to be productive, I don't have a strong preference between the two. 2. On the importance of "truly local" requests:This is the most important point for me. In my professional environment is privacy and governance. Policies are strict: code, secrets, and sensitive data must stay local. 3. On the most important UI aspects:The reason I use Warp is the "rich" terminal experience. If I move to local models, I still want the AI to feel integrated into the terminal, be context-aware, be able to see my current command buffer, and capable of assisting within the flow of my work. Privacy is my primary driver: I am looking for a way to use LLMs without compromising my data. |
Beta Was this translation helpful? Give feedback.
-
|
A local terminal should, first and foremost, function offline; online services should be optional. However, Warp has completely inverted this paradigm—which is precisely the primary reason I do not use it. That said, now that the project has been open-sourced, I felt compelled to offer my two cents. So, is this so-called "terminal" essentially just a conduit for piping all local data to an AI service provider? |
Beta Was this translation helpful? Give feedback.
-
|
Voting option 2 (Warp as ACP client) as the immediate unlock, with option 3 (a lite local Rust harness) as a longer-term parallel track. Option 4 (ngrok-style routing) doesn't satisfy the local-model use case at all in our experience. Why option 2 first:
Answers to your three questions, from a small-team-using-Warp-daily perspective:
Concrete grounding: we're targeting ~128GB unified-memory boxes (Jetson Thor / Apple silicon) running Devstral Small 2 (24B, Q8) and Qwen3-Coder-30B-A3B locally via |
Beta Was this translation helpful? Give feedback.
-
|
Thank you for being open to this and engaging the community. I think you've overwhelming heard from others that they want option 1. I'd say the same. If you implement this, I'll absolutely return to warp. I'm looking forward to it! |
Beta Was this translation helpful? Give feedback.
-
|
Answers to the Questions part: I am not sure I have a good answer for number 1 and 3 but 2 it's extremely important to me that local model is truly local that is likely the point of why I am using a local model to start with for that task. I haven't been using warp for a long time because it wasn't open source so I do not have many thoughts on what parts of the UI I want at this time for local models, the little I have done with like Claude cli and the information that the UI provides for that is really nice and to have something like that for local models would be cool but it might depend on what harness people are using idk. |
Beta Was this translation helpful? Give feedback.
-
|
I agree with the consensus that #1 is of course the best option, and I would like to point out that I am most interested in warp in its capabilities as a harness. Like most programmers, I still do use cloud models for the majority of my tasks, and the ability to use my OpenAI Codex sub while still taking advantage of the harness (like Opencode for example) would push me to use warp more. If this can be achieved in a lightweight manner, option #3 is the clear choice. If not then #1 |
Beta Was this translation helpful? Give feedback.
-
|
Updates on our plan here... With respect to BYOK and arbitrary endpoint support:
With respect to local model support, our plan is to
Thanks to everyone who has weighed in on this thread. |
Beta Was this translation helpful? Give feedback.
-
|
Main obstacle for using Warp at our company is the privacy: we have the strong governance model and we can't use 3rd party LLM providers. We would use BYOK and point it to the private instance hosted in Azure, AWS etc. Buying a license is not an issue: we just need to make sure that data never goes to any of Warp servers, which probably makes billing a bit challenging for Warp. We tried other solutions with BYOK, but none of them work with agents as good as Warp. |
Beta Was this translation helpful? Give feedback.
-
|
From an enterprise/privacy perspective, the important distinction is not only "local model support" but "local execution boundary." If prompts, repository context, tool traces, or intermediate reasoning still route through a remote harness, many regulated users will not be able to use it even if the final model endpoint is self-hosted. I would rank the options this way:
A good product surface would show an "effective data path" for each provider: what stays local, what goes to Warp, what goes to the model endpoint, and what is logged. That is the question security teams will ask before approving local/private model workflows. |
Beta Was this translation helpful? Give feedback.
-
|
Came here as heard Warp now supporting Windows. Installed and then immediately uninstalled after realising the product is effectively useless without sign up and sending data to yet another provider. With most workstation level laptops now coming with a dedicated NPU or GPU with a few gig of VRAM (even shared RAM is OK for lower end qwen models) people may as well make use of to make terminal life easier. Personally just want an AI shell for system management and basic automation:
Not interested in coding capabilities as use other tools for that. 1 and 3 are the better options. Given there are already forks in the wild why not encourage them to PR (if they haven't already) |
Beta Was this translation helpful? Give feedback.
-
|
It's frustrating how big companies constantly try to seize control of your computer and data. LLMs and their harnesses should act as assistants to the terminal, not as supervisors. If Warp continues with its closed mindset, open alternatives like OpenWarp or other terminal+LLM apps will thrive and take its place. |
Beta Was this translation helpful? Give feedback.
-
|
If I can contribute in any way whatsoever whether it's bug bounty for your project or contributing code. Please let me know I'd be highly interested. I have a personal vendetta against warp. I was just thinking along the lines of instead of starting. My own repository in starting this whole task, out from scratch, I would join the community. This is something that I did not do with deep seek |
Beta Was this translation helpful? Give feedback.
-
|
now that's music to the ears,Also add ability to fetch model, for /models endpoint, as well as context windows, and whether model has vision or not. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
We are trying to figure out the best way to implement local model support and I wanted to start a discussion on our different potential approaches to see what resonates most with the community.
The reason local model support is not trivial for us to implement is that our harness is split between our client (rust, open-source) and server (golang, not currently open). Moving the harness to be entirely on the client is a fair amount of work.
The options we are considering here (not mutually exclusive):
Questions on my mind:
Beta Was this translation helpful? Give feedback.
All reactions