Skip to content

fix(graph): resolve edge creation failure due to vertex ID mismatch#331

Merged
imbajin merged 7 commits into
apache:mainfrom
linmengmeng-1314:fix/edge-vid-mapping
May 19, 2026
Merged

fix(graph): resolve edge creation failure due to vertex ID mismatch#331
imbajin merged 7 commits into
apache:mainfrom
linmengmeng-1314:fix/edge-vid-mapping

Conversation

@linmengmeng-1314
Copy link
Copy Markdown
Contributor

@linmengmeng-1314 linmengmeng-1314 commented May 18, 2026

Summary

  • Fix edge creation failure when loading extracted graph data into HugeGraph
  • The HugeGraph server assigns vertex IDs (e.g., 1:Sarah) using numeric label IDs, which differ from LLM-predicted IDs (e.g., person:Sarah) that edges reference
  • Add vid_mapping to track the ID mapping and update edge outV/inV after vertex creation
  • Normalize extracted graph IDs and edge endpoints after LLM extraction, including missing item type fields and custom string IDs

Problem

When using "Load into GraphDB" to import extracted graph data, edges fail to be created with java.lang.IllegalArgumentException. The root cause is that the LLM generates vertex IDs using label names (e.g., person:Sarah), but the HugeGraph server uses numeric label IDs (e.g., 1:Sarah). Edge references still point to the LLM-predicted IDs, causing vertex lookup failures.

Solution

  • Save original vertex ID before server creation
  • Build a mapping from original ID to actual server-assigned ID
  • Apply the mapping when creating edges (with fallback to original ID for backward compatibility)
  • Derive PRIMARY_KEY vertex IDs from schema after LLM extraction
  • Reject edges whose endpoints cannot be resolved to vertices in the same extracted output
  • Preserve explicit CUSTOMIZE_STRING vertex IDs and pass them to HugeGraph when creating vertices
  • Align few-shot prompt examples with the current extraction contract

Flow

LLM output
  |
  v
Parse JSON / infer item type
  |
  v
Normalize vertices by schema
  - PRIMARY_KEY: person:Sarah -> 1:Sarah
  - CUSTOMIZE_STRING: keep explicit id
  |
  v
Validate edge endpoints
  - outV/inV must resolve to extracted vertices
  - labels must match source_label/target_label
  |
  v
Commit vertices
  - remember original id -> server id
  |
  v
Commit edges with mapped endpoints

TODO

  • Handle valid cross-chunk edges without requiring both endpoints to appear in the same chunk output. This likely needs graph extraction normalization after all chunks are merged, or a shared vertex index across chunks.
  • Preserve or coerce property data types consistently across PropertyGraphExtract.run() and Commit2Graph; filter_item() currently stringifies non-string values while commit-time validation expects typed values.
  • Consider broader non-PRIMARY_KEY id strategy support beyond the current minimal CUSTOMIZE_STRING path, such as CUSTOMIZE_NUMBER, CUSTOMIZE_UUID, and AUTOMATIC.

Test plan

  • Extract graph data with an LLM
  • Load into GraphDB and verify both vertices and edges are created successfully
  • Verify existing functionality is not broken when LLM predicts correct IDs
  • Verify LLM outputs without item type fields are accepted from grouped vertices/edges arrays
  • Verify label-name IDs such as person:Sarah are normalized to schema-derived IDs such as 1:Sarah
  • Verify explicit CUSTOMIZE_STRING IDs are preserved and committed with addVertex(..., id=...)

🤖 Generated with Claude Code

When loading graph data, the HugeGraph server assigns vertex IDs
(e.g., "1:Sarah") that differ from LLM-predicted IDs (e.g., "person:Sarah").
This causes edge creation to fail with IllegalArgumentException because
the edge references use the original LLM-predicted IDs which don't
match actual vertex IDs in the graph.

Add a vid_mapping to track the ID mapping and update edge references
after vertex creation.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@dosubot dosubot Bot added size:XS This PR changes 0-9 lines, ignoring generated files. bug Something isn't working labels May 18, 2026
@github-actions github-actions Bot added the llm label May 18, 2026
@imbajin imbajin requested a review from Copilot May 18, 2026 13:31
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

imbajin added 2 commits May 19, 2026 12:21
- add regression coverage for LLM vertex ids differing from created ids
- verify edges use server-created vertex ids after vertex creation
- keep the change scoped to commit_to_hugegraph tests
- define deterministic vertex ids from schema label ids
- require edges to reference emitted vertex ids
- add prompt contract tests and validate prompt output with agent
@dosubot dosubot Bot added size:L This PR changes 100-499 lines, ignoring generated files. and removed size:XS This PR changes 0-9 lines, ignoring generated files. labels May 19, 2026
- add root AGENTS guidance for repo-wide module boundaries

- refactor hugegraph-llm AGENTS into concise module rules

- emphasize sufficient and effective test coverage

- align llm test commands with CI external-service skips
- infer vertex and edge type from grouped extraction arrays
- remove redundant item-level type requirement from extraction prompts
- align prompt example resources with deterministic vertex id rules
- add parser and prompt-contract regression coverage
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

- merge latest main changes for graph parsing and API tests
- resolve property graph extract test conflict by preserving both coverage sets
- keep grouped item type inference tests alongside fenced and flat JSON cases
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

@imbajin imbajin force-pushed the fix/edge-vid-mapping branch from 6715810 to dccff8f Compare May 19, 2026 11:47
- derive primary-key vertex ids from schema after LLM extraction
- resolve or reject edge endpoints before graph commit
- align prompt examples with the extraction contract
- remove unrelated AGENTS changes from the PR diff
@imbajin imbajin force-pushed the fix/edge-vid-mapping branch from dccff8f to 6755a22 Compare May 19, 2026 12:09
Copy link
Copy Markdown
Member

@imbajin imbajin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@linmengmeng-1314 refactor the pipeline for V2

could try to test it when free (Leave some TODOs in desc)

@dosubot dosubot Bot added the lgtm This PR has been approved by a maintainer label May 19, 2026
@imbajin imbajin merged commit a15965e into apache:main May 19, 2026
13 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working lgtm This PR has been approved by a maintainer llm size:L This PR changes 100-499 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants