feat: add Anthropic-compatible serving endpoints#4538
feat: add Anthropic-compatible serving endpoints#4538lvhan028 wants to merge 3 commits intoInternLM:mainfrom
Conversation
Introduce Anthropic-style messages, count_tokens, and model-list endpoints with dedicated per-endpoint handlers so LMDeploy can interoperate with Anthropic-oriented clients while keeping OpenAI routes unchanged. Made-with: Cursor
296c675 to
48395a0
Compare
There was a problem hiding this comment.
Pull request overview
Adds an Anthropic-compatible API surface to LMDeploy’s serving stack, including message generation, token counting, and Anthropic-scoped model listing.
Changes:
- Introduces
lmdeploy.serve.anthropicpackage with protocol models, adapters, endpoints, and SSE streaming utilities. - Wires the Anthropic router into the existing OpenAI FastAPI server.
- Adds endpoint tests plus English/Chinese documentation pages and index links.
Reviewed changes
Copilot reviewed 17 out of 17 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
lmdeploy/serve/openai/api_server.py |
Mounts the Anthropic router into the main FastAPI app. |
lmdeploy/serve/anthropic/router.py |
Assembles Anthropic endpoint modules into a single router. |
lmdeploy/serve/anthropic/endpoints/messages.py |
Implements POST /v1/messages (streaming + non-streaming). |
lmdeploy/serve/anthropic/endpoints/messages_count_tokens.py |
Implements POST /v1/messages/count_tokens. |
lmdeploy/serve/anthropic/endpoints/models.py |
Implements GET /anthropic/v1/models. |
lmdeploy/serve/anthropic/streaming.py |
Converts LMDeploy generation streams into Anthropic-style SSE events. |
lmdeploy/serve/anthropic/adapter.py |
Maps between Anthropic request/response shapes and LMDeploy/OpenAI internals. |
lmdeploy/serve/anthropic/protocol.py |
Adds Pydantic models for Anthropic-compatible request/response payloads. |
lmdeploy/serve/anthropic/errors.py |
Adds Anthropic-style error response helper. |
tests/test_lmdeploy/serve/anthropic/test_endpoints.py |
Adds tests covering endpoint behavior and SSE shape. |
docs/en/llm/api_server_anthropic.md |
Documents Anthropic-compatible endpoints (English). |
docs/zh_cn/llm/api_server_anthropic.md |
Documents Anthropic-compatible endpoints (Chinese). |
docs/en/llm/api_server.md |
Links to the new Anthropic docs page. |
docs/en/index.rst / docs/zh_cn/index.rst |
Adds the new doc page to navigation. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| app = FastAPI(docs_url='/', lifespan=lifespan) | ||
|
|
||
| app.include_router(router) | ||
| app.include_router(create_anthropic_router(VariableInterface)) |
There was a problem hiding this comment.
The router is created with create_anthropic_router(VariableInterface) (the class) rather than the server_context = VariableInterface() instance used elsewhere in this file (e.g., check_request). Passing the instance improves consistency and avoids surprises if VariableInterface ever gains instance-level state.
| if request.tools and (parser_cls is None or parser_cls.tool_parser_cls is None): | ||
| return create_error_response( | ||
| HTTPStatus.BAD_REQUEST, | ||
| 'Please launch the api_server with --tool-call-parser if you want to use tool.') |
There was a problem hiding this comment.
User-facing error message is grammatically awkward: "if you want to use tool." Consider rewording to "...if you want to use tools." or "...if you want to use tool calling." to be clearer.
| 'Please launch the api_server with --tool-call-parser if you want to use tool.') | |
| 'Please launch the api_server with --tool-call-parser if you want to use tools.') |
| if block is None: | ||
| closing = _close_current_block() | ||
| if closing: | ||
| events.append(closing) |
There was a problem hiding this comment.
In _start_tool_block, switching to an already-created tool block does not close the currently-open content block (the _close_current_block() call only happens when block is None). If the stream ever interleaves text/thinking with additional tool deltas for the same tool index, this will produce invalid/missing content_block_stop events. Consider always closing the current block when changing current_block kind/index, even when reusing an existing tool block.
| if block is None: | |
| closing = _close_current_block() | |
| if closing: | |
| events.append(closing) | |
| target_block_index = block['block_index'] if block is not None else None | |
| same_block = ( | |
| current_block is not None and current_block.get('kind') == 'tool_use' | |
| and current_block.get('tool_index') == tool_index | |
| and current_block.get('block_index') == target_block_index) | |
| if not same_block: | |
| closing = _close_current_block() | |
| if closing: | |
| events.append(closing) | |
| if block is None: |
| 'content_block': { | ||
| 'type': 'tool_use', | ||
| 'id': tool_delta.id, | ||
| 'name': '', | ||
| 'input': {}, | ||
| }, |
There was a problem hiding this comment.
content_block_start for tool_use is emitted with an empty name even though tool_delta.function.name is typically available. Clients may rely on the tool name being present on the initial tool_use block. Populate the tool name (and store it in tool_blocks) when starting the block, using the delta's function name when present.
No description provided.