Add streaming support for chat: async generator bug and add documentation #7187

adamwdraper · 2025-11-16T19:46:27Z

Summary

This PR adds real-time streaming support to mo.ui.chat, enabling ChatGPT-like experiences where responses appear word-by-word as they're generated.

What's New

✨ Streaming Support for Chat Models

Users can now stream responses from chat models in real-time:

With built-in models (OpenAI, etc.):

chat = mo.ui.chat(
    mo.ai.llm.openai(
        "gpt-4o",
        stream=True,  # Enable streaming! Uses sync generator internally
    )
)

With custom models using sync generators:

# Custom models can use EITHER sync or async generators
def my_streaming_model(messages, config):
    yield accumulated  # Sync generator - works!

async def my_async_model(messages, config):
    yield accumulated  # Async generator - also works!

📚 Documentation

Added comprehensive "Streaming Responses" section to chat API docs
Explains both built-in model streaming and custom async generator approach
Includes links to working examples (streaming_openai.py and streaming_custom.py)

🎯 Examples

streaming_openai.py: Demonstrates streaming with OpenAI models
streaming_custom.py: Shows how to build custom streaming chatbots
Fixed quote escaping in examples to prevent file corruption on auto-save

Implementation Details

Added stream parameter to openai ChatModel class
Implemented async generator support in chat UI element
Frontend receives incremental updates via stream_chunk messages
Final message marked with is_final: true flag

Testing

Added test_chat_streaming_sends_messages to verify streaming behavior
Manually tested both OpenAI and custom streaming examples
Code passes lint and format checks

User Experience

Users will see responses appear progressively in the chat interface, creating a more engaging and responsive experience similar to ChatGPT, Claude, and other modern AI chat interfaces.

Implements real-time streaming for chat UI elements: Backend changes: - Modified chat._send_prompt to stream chunks via _send_message for async generators - Added UUID generation for tracking streaming messages - Sends incremental chunks with 'stream_chunk' type and 'is_final' flag - Updated docstring to reflect streaming support Frontend changes: - Added MarimoIncomingMessageEvent listener to receive streaming chunks - Implemented streamingStateRef to track backend message_id and frontend message index - Creates placeholder messages and updates them in real-time as chunks arrive - Added host prop to Chatbot component for event listening AI model changes: - Converted mo.ai.llm.openai to async generator with stream=True - Yields accumulated content as tokens arrive from the API - Enables automatic streaming for built-in OpenAI models The implementation uses existing SendUIElementMessage infrastructure for bidirectional communication via WebSockets. Async generator functions now automatically stream responses to the frontend in real-time. Resolves the TODO at line 232 in chat.py about streaming support.

- Backend: Send streaming chunks via WebSocket messages (_send_message) - Async generators now stream responses to frontend in real-time - Each chunk includes message_id, content, and is_final flag - Updated documentation to reflect streaming support - Frontend: Listen for streaming chunks via MarimoIncomingMessageEvent - Track streaming state and update UI as chunks arrive - Accumulate and display content incrementally - Support both streaming and non-streaming responses - Built-in models: Add stream parameter to mo.ai.llm.openai - Defaults to False for backward compatibility - When True, streams tokens from OpenAI API as async generator - Supports both streaming and non-streaming modes - Examples: Add streaming chat examples - streaming_custom.py: Shows custom async generator streaming - streaming_openai.py: Shows OpenAI API streaming with stream=True - Updated README.md with streaming documentation Streaming creates a ChatGPT-like experience where responses appear token-by-token as they're generated, improving perceived responsiveness.

- Test verifies streaming async generators send chunks via _send_message - Validates message structure (type, message_id, content, is_final) - Confirms intermediate chunks have is_final=False - Confirms final chunk has is_final=True and accumulated content

Change 'return' to 'yield' in non-streaming path to fix SyntaxError. In Python, async generators cannot use 'return' with a value - they must consistently use 'yield' throughout the function.

- Remove trailing whitespace - Fix return statement to return tuple with chatbot only - Use single quotes in markdown code example to avoid escaping issues when marimo auto-saves the notebook

- Add 'Streaming Responses' section explaining real-time streaming - Show how to enable streaming with built-in models (stream=True) - Show how to implement streaming with custom async generators - Include links to streaming examples in the repo

vercel · 2025-11-16T19:46:32Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Preview	Comments	Updated (UTC)
marimo-docs	Ready	Preview	Comment	Nov 19, 2025 4:30am

github-actions · 2025-11-16T19:46:42Z

All contributors have signed the CLA ✍️ ✅
_{Posted by the CLA Assistant Lite bot.}

for more information, see https://pre-commit.ci

adamwdraper · 2025-11-16T21:21:55Z

I have read the CLA Document and I hereby sign the CLA

- Add stream parameter to anthropic, google, groq, and bedrock models - Implement streaming logic for each model using their native APIs - Update documentation to list all models that support streaming - All models now follow the same pattern: stream=True enables real-time responses

for more information, see https://pre-commit.ci

- Add commented-out stream=True parameter to all example files - Shows users how to enable streaming without changing default behavior - Covers: OpenAI, Anthropic, Google (Gemini), Groq, and Bedrock examples

- Add noqa comment for unused buffers parameter - Replace == False with 'not' operator - Replace == True with direct boolean check

mscolnick · 2025-11-17T15:01:54Z

docs/api/inputs/chat.md

+
+    for word in words:
+        accumulated += word + " "
+        yield accumulated


should this be just yield word? seems like the accumulation happens client-side

The accumulation must happen server-side. Each yield sends the full accumulated text to the frontend, which displays it. If we yielded just word, the UI would only show individual words instead of the progressively building response. This is consistent with how all built-in models work (OpenAI, Anthropic, Google, etc.).

mscolnick · 2025-11-17T15:04:44Z

marimo/_plugins/ui/_impl/chat/chat.py

+                chunk_text = str(latest_response)
+
+                # Send incremental update to frontend
+                self._send_message(


could we DRY this up? to:

self._send_message( { "type": "stream_chunk", "message_id": message_id, "content": accumulated_text, "is_final": latest_response is not None, }, buffers=None, )

Good catch! Simplified by eliminating the chunk_text variable and directly using accumulated_text. Fixed in d71c725.

mscolnick · 2025-11-17T15:05:50Z

marimo/_ai/llm/_impl.py

-            return response.choices[0].message.content
+            if self.stream:
+                # Stream the response
+                response = litellm_completion(


i think we can write response = litellm_completion( once with stream=self.stream and then just handle the response differently in the if/else

Done! Unified the completion call using stream=self.stream. The different response types are handled in the if/else branches. Fixed in d71c725.

mscolnick · 2025-11-17T15:06:22Z

marimo/_ai/llm/_impl.py

+            response = client.models.generate_content_stream(
+                model=self.model,
+                contents=google_messages,
+                config={


could we DRY up this config and pull it out above

Agreed! Extracted the config dict to a single generation_config variable. Fixed in d71c725.

for more information, see https://pre-commit.ci

- Change delete icon to rotate-cw icon for clearing chat history - Add disabled state when no messages exist (consistent with send button) - Improves UX clarity: trash icon was confusing as it looked like deleting the cell rather than resetting the conversation - Individual message delete buttons still use trash icon appropriately

adamwdraper · 2025-11-18T16:03:44Z

Additional UI Improvement: Chat Reset Icon Change

Changed the chat reset button icon from trash to rotate-clockwise to improve UX clarity.

Why this change?
The trash icon was confusing as it looked like it would delete the cell itself rather than reset the conversation. The rotate icon better represents the "reset/restart" action.

Additional improvements:

Reset button is now disabled when there are no messages (consistent with send button behavior)
Individual message delete buttons still use the trash icon appropriately

See commit: 4b987ea

…dels Problem: - When stream=False, chat models returned None instead of strings - Python treats any function with 'yield' as a generator, even if the yield is in an unexecuted branch - This caused __call__ to return a generator object instead of a string when stream=False Solution: - Extract streaming logic into separate _stream_response() helper methods - __call__ now only contains return statements (no yield) - When stream=True: returns generator from helper method - When stream=False: returns string directly - Maintains backward compatibility (kept sync def __call__) Changes: - All chat models (OpenAI, Anthropic, Google, Groq, Bedrock) updated - Added sync generator support to chat UI (inspect.isgenerator) - Added test_chat_sync_generator_streaming() test - Both streaming and non-streaming modes now work correctly

Changes: - Refactor chat UI to use single _handle_streaming_response() method for both sync and async generators (eliminated ~70 lines of duplication) - Fix streaming cutoff: always yield final accumulated result even when last chunk has no content (fixes incomplete responses) - Add test_chat_streaming_complete_response() to catch cutoff bugs - Increase default max_tokens from 100 to 4096 tokens (100 was way too low, caused stories/responses to cut off mid-sentence) The streaming cutoff fix ensures that when OpenAI/other APIs send final chunks with no content (just finish_reason), we still capture the complete accumulated text. The high default (4096 tokens ~3000 words) matches industry standards and prevents artificial truncation while still providing reasonable cost control.

for more information, see https://pre-commit.ci

Add return type annotations to all _stream_response() and _handle_streaming_response() methods to satisfy linting: - _stream_response() -> Generator[str, None, None] - _handle_streaming_response() -> str Also added Generator to typing imports.

for more information, see https://pre-commit.ci

Ruff requires type-only imports like Generator to be in a TYPE_CHECKING block to avoid runtime overhead.

Update documentation to clarify that both sync and async generators are supported for streaming chat responses: - docs/api/inputs/chat.md: Added sync generator example alongside async - chat.py docstring: Clarified both sync and async generators work Built-in models (OpenAI, Anthropic, etc.) use sync generators internally, while custom models can use either depending on their needs.

mscolnick · 2025-11-18T20:46:24Z

@adamwdraper looks great! few build errors to fix, but otherwise good to merge afterwards.

we can followup as a team to figure out the correct default for stream

- Fix circular type reference in ChatPlugin.tsx by using inferred types - Remove forbidden non-null assertion in chat-ui.tsx streaming handler - Extract frontendMessageIndex before null check to avoid non-null assertion operator

for more information, see https://pre-commit.ci

- Add type annotations to all _stream_response methods (openai, anthropic, google, groq, bedrock) - Fix union-attr error in openai response by using cast(Any, response) - Add missing return statement in anthropic __call__ for else branch - Fix google generate_content config type with cast(Any, generation_config) - Add type annotations to chat._handle_streaming_response - Fix return type in chat._handle_streaming_response to handle None case

for more information, see https://pre-commit.ci

- Replace deprecated 'gemini-1.5-pro-latest' with 'gemini-2.5-flash' - Add google-genai>=1.20.0 dependency to script metadata - Update generated_with version to 0.17.8

adamwdraper added 7 commits November 16, 2025 11:33

Fix async generator syntax error in openai ChatModel

4dd222e

Change 'return' to 'yield' in non-streaming path to fix SyntaxError. In Python, async generators cannot use 'return' with a value - they must consistently use 'yield' throughout the function.

Clean up streaming_custom example and fix quote escaping

7064ad8

- Remove trailing whitespace - Fix return statement to return tuple with chatbot only - Use single quotes in markdown code example to avoid escaping issues when marimo auto-saves the notebook

Clean up trailing whitespace

d56a2f5

adamwdraper requested review from Light2Dark, akshayka and manzt as code owners November 16, 2025 19:46

github-actions bot added the documentation Improvements or additions to documentation label Nov 16, 2025

vercel bot deployed to Preview November 16, 2025 19:47 View deployment

[pre-commit.ci] auto fixes from pre-commit.com hooks

b84808c

for more information, see https://pre-commit.ci

vercel bot deployed to Preview November 16, 2025 19:48 View deployment

vercel bot deployed to Preview November 16, 2025 21:41 View deployment

[pre-commit.ci] auto fixes from pre-commit.com hooks

6c68cfe

for more information, see https://pre-commit.ci

vercel bot deployed to Preview November 16, 2025 21:43 View deployment

Add streaming hints to all chat model examples

3c6d6e6

- Add commented-out stream=True parameter to all example files - Shows users how to enable streaming without changing default behavior - Covers: OpenAI, Anthropic, Google (Gemini), Groq, and Bedrock examples

vercel bot deployed to Preview November 16, 2025 21:47 View deployment

Fix linting errors in streaming test

a384ee0

- Add noqa comment for unused buffers parameter - Replace == False with 'not' operator - Replace == True with direct boolean check

vercel bot deployed to Preview November 16, 2025 22:00 View deployment

adamwdraper changed the title ~~Fix streaming support for chat: async generator bug and add documentation~~ Add streaming support for chat: async generator bug and add documentation Nov 16, 2025

mscolnick reviewed Nov 17, 2025

View reviewed changes

adamwdraper requested a review from mscolnick November 17, 2025 17:53

vercel bot deployed to Preview November 17, 2025 17:56 View deployment

[pre-commit.ci] auto fixes from pre-commit.com hooks

2218e63

for more information, see https://pre-commit.ci

vercel bot deployed to Preview November 17, 2025 17:58 View deployment

vercel bot deployed to Preview November 18, 2025 15:58 View deployment

adamwdraper added 2 commits November 18, 2025 10:22

vercel bot deployed to Preview November 18, 2025 18:38 View deployment

[pre-commit.ci] auto fixes from pre-commit.com hooks

ea6ed7c

for more information, see https://pre-commit.ci

vercel bot deployed to Preview November 18, 2025 18:39 View deployment

vercel bot deployed to Preview November 18, 2025 19:25 View deployment

[pre-commit.ci] auto fixes from pre-commit.com hooks

324c6a0

for more information, see https://pre-commit.ci

vercel bot deployed to Preview November 18, 2025 19:27 View deployment

fix: move Generator import into TYPE_CHECKING block (TC003)

380b389

Ruff requires type-only imports like Generator to be in a TYPE_CHECKING block to avoid runtime overhead.

vercel bot deployed to Preview November 18, 2025 19:55 View deployment

vercel bot deployed to Preview November 18, 2025 20:04 View deployment

fix: resolve TypeScript errors in chat plugin

65a325d

- Fix circular type reference in ChatPlugin.tsx by using inferred types - Remove forbidden non-null assertion in chat-ui.tsx streaming handler - Extract frontendMessageIndex before null check to avoid non-null assertion operator

vercel bot deployed to Preview November 18, 2025 21:46 View deployment

[pre-commit.ci] auto fixes from pre-commit.com hooks

48a56f2

for more information, see https://pre-commit.ci

vercel bot deployed to Preview November 18, 2025 21:48 View deployment

adamwdraper and others added 2 commits November 18, 2025 16:13

[pre-commit.ci] auto fixes from pre-commit.com hooks

75d7926

for more information, see https://pre-commit.ci

vercel bot deployed to Preview November 18, 2025 23:16 View deployment

Update gemini.py to use current model instead of deprecated one

6afbfc2

- Replace deprecated 'gemini-1.5-pro-latest' with 'gemini-2.5-flash' - Add google-genai>=1.20.0 dependency to script metadata - Update generated_with version to 0.17.8

vercel bot deployed to Preview November 19, 2025 04:30 View deployment

Add streaming support for chat: async generator bug and add documentation #7187

Are you sure you want to change the base?

Add streaming support for chat: async generator bug and add documentation #7187

Conversation

adamwdraper commented Nov 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

What's New

✨ Streaming Support for Chat Models

📚 Documentation

🎯 Examples

Implementation Details

Testing

User Experience

Uh oh!

vercel bot commented Nov 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Nov 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

adamwdraper commented Nov 16, 2025

Uh oh!

mscolnick Nov 17, 2025

Choose a reason for hiding this comment

Uh oh!

adamwdraper Nov 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mscolnick Nov 17, 2025

Choose a reason for hiding this comment

Uh oh!

adamwdraper Nov 17, 2025

Choose a reason for hiding this comment

Uh oh!

mscolnick Nov 17, 2025

Choose a reason for hiding this comment

Uh oh!

adamwdraper Nov 17, 2025

Choose a reason for hiding this comment

Uh oh!

mscolnick Nov 17, 2025

Choose a reason for hiding this comment

Uh oh!

adamwdraper Nov 17, 2025

Choose a reason for hiding this comment

Uh oh!

adamwdraper commented Nov 18, 2025

Additional UI Improvement: Chat Reset Icon Change

Uh oh!

mscolnick commented Nov 18, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

adamwdraper commented Nov 16, 2025 •

edited

Loading

vercel bot commented Nov 16, 2025 •

edited

Loading

github-actions bot commented Nov 16, 2025 •

edited

Loading

adamwdraper Nov 17, 2025 •

edited

Loading