-
Notifications
You must be signed in to change notification settings - Fork 789
Add streaming support for chat: async generator bug and add documentation #7187
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
433cf28
dbde167
9407f17
4dd222e
7064ad8
545335f
d56a2f5
b84808c
8d401d0
6c68cfe
3c6d6e6
a384ee0
d71c725
0938d24
2218e63
4b987ea
d854bea
640f6fc
ea6ed7c
e936d5c
324c6a0
380b389
65123bb
65a325d
48a56f2
aa5fd3c
75d7926
6afbfc2
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -132,6 +132,92 @@ mo.ui.chat( | |
| ) | ||
| ``` | ||
|
|
||
| ## Streaming Responses | ||
|
|
||
| Chatbots can stream responses in real-time, creating a more interactive experience | ||
| similar to ChatGPT where you see the response appear word-by-word as it's generated. | ||
|
|
||
| ### With Built-in Models | ||
|
|
||
| For built-in models (OpenAI, Anthropic, Google, Groq, Bedrock), set `stream=True` in the model constructor: | ||
|
|
||
| ```python | ||
| import marimo as mo | ||
|
|
||
| chat = mo.ui.chat( | ||
| mo.ai.llm.openai( | ||
| "gpt-4o", | ||
| system_message="You are a helpful assistant.", | ||
| stream=True, # Enable streaming | ||
| ), | ||
| show_configuration_controls=True | ||
| ) | ||
| chat | ||
| ``` | ||
|
|
||
| This works for all built-in models: | ||
|
|
||
| - `mo.ai.llm.openai("gpt-4o", stream=True)` | ||
| - `mo.ai.llm.anthropic("claude-3-5-sonnet-20240620", stream=True)` | ||
| - `mo.ai.llm.google("gemini-1.5-pro-latest", stream=True)` | ||
| - `mo.ai.llm.groq("llama-3.1-70b-versatile", stream=True)` | ||
| - `mo.ai.llm.bedrock("anthropic.claude-3-7-sonnet-20250219-v1:0", stream=True)` | ||
|
|
||
| ### With Custom Models | ||
|
|
||
| For custom models, you can use either regular (sync) or async generator functions that yield intermediate results: | ||
|
|
||
| **Sync generator (simpler):** | ||
|
|
||
| ```python | ||
| import marimo as mo | ||
| import time | ||
|
|
||
| def streaming_model(messages, config): | ||
| """Stream responses word by word.""" | ||
| response = "This response will appear word by word!" | ||
| words = response.split() | ||
| accumulated = "" | ||
|
|
||
| for word in words: | ||
| accumulated += word + " " | ||
| yield accumulated | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. should this be just
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The accumulation must happen server-side. Each yield sends the full accumulated text to the frontend, which displays it. If we yielded just |
||
| time.sleep(0.1) # Simulate processing delay | ||
|
|
||
| chat = mo.ui.chat(streaming_model) | ||
| chat | ||
| ``` | ||
|
|
||
| **Async generator (for async operations):** | ||
|
|
||
| ```python | ||
| import marimo as mo | ||
| import asyncio | ||
|
|
||
| async def async_streaming_model(messages, config): | ||
| """Stream responses word by word asynchronously.""" | ||
| response = "This response will appear word by word!" | ||
| words = response.split() | ||
| accumulated = "" | ||
|
|
||
| for word in words: | ||
| accumulated += word + " " | ||
| yield accumulated | ||
| await asyncio.sleep(0.1) # Async processing delay | ||
|
|
||
| chat = mo.ui.chat(async_streaming_model) | ||
| chat | ||
| ``` | ||
|
|
||
| Each `yield` sends an update to the frontend, and the chat UI will display | ||
| the progressively accumulated response in real-time. | ||
|
|
||
| !!! tip "See streaming examples" | ||
| For complete working examples, check out: | ||
|
|
||
| - [`streaming_openai.py`](https://github.com/marimo-team/marimo/blob/main/examples/ai/chat/streaming_openai.py) - Streaming with OpenAI models | ||
| - [`streaming_custom.py`](https://github.com/marimo-team/marimo/blob/main/examples/ai/chat/streaming_custom.py) - Custom streaming chatbot | ||
|
|
||
| ## Built-in Models | ||
|
|
||
| marimo provides several built-in AI models that you can use with the chat UI | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,102 @@ | ||
| # /// script | ||
| # requires-python = ">=3.11" | ||
| # dependencies = [ | ||
| # "marimo", | ||
| # ] | ||
| # /// | ||
|
|
||
| import marimo | ||
|
|
||
| __generated_with = "0.17.8" | ||
| app = marimo.App(width="medium") | ||
|
|
||
|
|
||
| @app.cell | ||
| def _(): | ||
| import marimo as mo | ||
| import asyncio | ||
| return asyncio, mo | ||
|
|
||
|
|
||
| @app.cell(hide_code=True) | ||
| def _(mo): | ||
| mo.md(""" | ||
| # Custom streaming chatbot | ||
|
|
||
| This example shows how to make a chatbot that streams responses. | ||
| Create an async generator function that yields intermediate results, | ||
| and watch the response appear incrementally! | ||
| """) | ||
| return | ||
|
|
||
|
|
||
| @app.cell | ||
| def _(asyncio, mo): | ||
| async def streaming_echo_model(messages, config): | ||
| """This chatbot echoes what the user says, word by word.""" | ||
| # Get the user's message | ||
| user_message = messages[-1].content | ||
|
|
||
| # Stream the response word by word | ||
| response = f"You said: '{user_message}'. Here's my response streaming word by word!" | ||
| words = response.split() | ||
| accumulated = "" | ||
|
|
||
| for word in words: | ||
| accumulated += word + " " | ||
| yield accumulated | ||
| await asyncio.sleep(0.2) # Delay to make streaming visible | ||
|
|
||
| chatbot = mo.ui.chat( | ||
| streaming_echo_model, | ||
| prompts=["Hello", "Tell me a story", "What is streaming?"], | ||
| show_configuration_controls=True | ||
| ) | ||
| return (chatbot,) | ||
|
|
||
|
|
||
| @app.cell | ||
| def _(chatbot): | ||
| chatbot | ||
| return | ||
|
|
||
|
|
||
| @app.cell | ||
| def _(mo): | ||
| mo.md(""" | ||
| ## How it works | ||
|
|
||
| The key is to make your model function an **async generator**: | ||
|
|
||
| ```python | ||
| async def my_model(messages, config): | ||
| response = 'Building up text...' | ||
| accumulated = '' | ||
| for part in response.split(): | ||
| accumulated += part + ' ' | ||
| yield accumulated # Each yield updates the UI | ||
| await asyncio.sleep(0.1) | ||
| ``` | ||
|
|
||
| Each `yield` sends an update to the frontend, creating a smooth streaming effect! | ||
| """) | ||
| return | ||
|
|
||
|
|
||
| @app.cell | ||
| def _(mo): | ||
| mo.md(""" | ||
| Access the chatbot's historical messages with `chatbot.value`. | ||
| """) | ||
| return | ||
|
|
||
|
|
||
| @app.cell | ||
| def _(chatbot): | ||
| # chatbot.value is the list of chat messages | ||
| chatbot.value | ||
| return | ||
|
|
||
|
|
||
| if __name__ == "__main__": | ||
| app.run() |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,117 @@ | ||
| # /// script | ||
| # requires-python = ">=3.11" | ||
| # dependencies = [ | ||
| # "marimo", | ||
| # "openai>=1.55.3", | ||
| # ] | ||
| # /// | ||
|
|
||
| import marimo | ||
|
|
||
| __generated_with = "0.17.8" | ||
| app = marimo.App(width="medium") | ||
|
|
||
|
|
||
| @app.cell | ||
| def _(): | ||
| import marimo as mo | ||
| return (mo,) | ||
|
|
||
|
|
||
| @app.cell(hide_code=True) | ||
| def _(mo): | ||
| mo.md(""" | ||
| # OpenAI streaming chatbot | ||
|
|
||
| This example shows how to use OpenAI's API with streaming responses. | ||
| The built-in `mo.ai.llm.openai()` model automatically streams tokens | ||
| as they arrive from the API! | ||
|
|
||
| Enter your API key below to try it out. | ||
| """) | ||
| return | ||
|
|
||
|
|
||
| @app.cell | ||
| def _(mo): | ||
| api_key_input = mo.ui.text( | ||
| placeholder="sk-...", | ||
| label="OpenAI API Key", | ||
| kind="password", | ||
| ) | ||
| api_key_input | ||
| return (api_key_input,) | ||
|
|
||
|
|
||
| @app.cell | ||
| def _(api_key_input, mo): | ||
| if api_key_input.value: | ||
| chatbot = mo.ui.chat( | ||
| mo.ai.llm.openai( | ||
| "gpt-4o-mini", | ||
| system_message="You are a helpful assistant. Keep responses concise and friendly.", | ||
| api_key=api_key_input.value, | ||
| stream=True, # Enable streaming | ||
| ), | ||
| prompts=[ | ||
| "Tell me a short joke", | ||
| "What is Python?", | ||
| "Explain streaming in one sentence", | ||
| ], | ||
| show_configuration_controls=True, | ||
| ) | ||
| else: | ||
| chatbot = mo.md("*Enter your OpenAI API key above to start chatting*") | ||
| return (chatbot,) | ||
|
|
||
|
|
||
| @app.cell | ||
| def _(chatbot): | ||
| chatbot | ||
| return | ||
|
|
||
|
|
||
| @app.cell | ||
| def _(mo): | ||
| mo.md(""" | ||
| ## How it works | ||
|
|
||
| The built-in OpenAI model returns an async generator that yields tokens | ||
| as they stream from the API: | ||
|
|
||
| ```python | ||
| mo.ui.chat( | ||
| mo.ai.llm.openai( | ||
| "gpt-4o-mini", | ||
| api_key="your-key", | ||
| stream=True, # Enable streaming! | ||
| ) | ||
| ) | ||
| ``` | ||
|
|
||
| Set `stream=True` to enable streaming responses. 🚀 | ||
|
|
||
| Other built-in models (`anthropic`, `google`, `groq`) work the same way. | ||
| """) | ||
| return | ||
|
|
||
|
|
||
| @app.cell | ||
| def _(chatbot, mo): | ||
| # Show chat history | ||
| if hasattr(chatbot, 'value'): | ||
| mo.md(f"**Chat history:** {len(chatbot.value)} messages") | ||
| return | ||
|
|
||
|
|
||
| @app.cell | ||
| def _(chatbot): | ||
| # Display full history | ||
| if hasattr(chatbot, 'value'): | ||
| chatbot.value | ||
| return | ||
|
|
||
|
|
||
| if __name__ == "__main__": | ||
| app.run() | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this works out of the box with built-in models, should streaming be enabled by default?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good question! I kept it
Falseby default for backward compatibility - existing code shouldn't change behavior. But I can see the argument forTrueas default since it's a better UX and works with all built-in models. Happy to change it if you think that's the right call - what do you think?Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since all built-in models support streaming, I'd say just remove the argument and always stream the response. I can't think of any case when a user wouldn't want to stream the response (are there any?)