fix(graph): resolve edge creation failure due to vertex ID mismatch#331
Merged
Conversation
When loading graph data, the HugeGraph server assigns vertex IDs (e.g., "1:Sarah") that differ from LLM-predicted IDs (e.g., "person:Sarah"). This causes edge creation to fail with IllegalArgumentException because the edge references use the original LLM-predicted IDs which don't match actual vertex IDs in the graph. Add a vid_mapping to track the ID mapping and update edge references after vertex creation. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- add regression coverage for LLM vertex ids differing from created ids - verify edges use server-created vertex ids after vertex creation - keep the change scoped to commit_to_hugegraph tests
- define deterministic vertex ids from schema label ids - require edges to reference emitted vertex ids - add prompt contract tests and validate prompt output with agent
- add root AGENTS guidance for repo-wide module boundaries - refactor hugegraph-llm AGENTS into concise module rules - emphasize sufficient and effective test coverage - align llm test commands with CI external-service skips
- infer vertex and edge type from grouped extraction arrays - remove redundant item-level type requirement from extraction prompts - align prompt example resources with deterministic vertex id rules - add parser and prompt-contract regression coverage
- merge latest main changes for graph parsing and API tests - resolve property graph extract test conflict by preserving both coverage sets - keep grouped item type inference tests alongside fenced and flat JSON cases
6715810 to
dccff8f
Compare
- derive primary-key vertex ids from schema after LLM extraction - resolve or reject edge endpoints before graph commit - align prompt examples with the extraction contract - remove unrelated AGENTS changes from the PR diff
dccff8f to
6755a22
Compare
imbajin
approved these changes
May 19, 2026
Member
There was a problem hiding this comment.
@linmengmeng-1314 refactor the pipeline for V2
could try to test it when free (Leave some TODOs in desc)
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
1:Sarah) using numeric label IDs, which differ from LLM-predicted IDs (e.g.,person:Sarah) that edges referencevid_mappingto track the ID mapping and update edgeoutV/inVafter vertex creationtypefields and custom string IDsProblem
When using "Load into GraphDB" to import extracted graph data, edges fail to be created with
java.lang.IllegalArgumentException. The root cause is that the LLM generates vertex IDs using label names (e.g.,person:Sarah), but the HugeGraph server uses numeric label IDs (e.g.,1:Sarah). Edge references still point to the LLM-predicted IDs, causing vertex lookup failures.Solution
CUSTOMIZE_STRINGvertex IDs and pass them to HugeGraph when creating verticesFlow
TODO
PropertyGraphExtract.run()andCommit2Graph;filter_item()currently stringifies non-string values while commit-time validation expects typed values.CUSTOMIZE_STRINGpath, such asCUSTOMIZE_NUMBER,CUSTOMIZE_UUID, andAUTOMATIC.Test plan
typefields are accepted from groupedvertices/edgesarraysperson:Sarahare normalized to schema-derived IDs such as1:SarahCUSTOMIZE_STRINGIDs are preserved and committed withaddVertex(..., id=...)🤖 Generated with Claude Code