-
Notifications
You must be signed in to change notification settings - Fork 4
Refactor Heuristic Evaluation Engine to LLM-Based Analysis #2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
actually, i realized the mock data (omniparser_client.py) was incorrect. it included an attributes={'level': 'h1'} field that the real omniparser does not actually output. since the real model only returns bounding boxes and text, we cannot rely on font_size or level attributes. please update the logic to infer hierarchy from the element's height (bounds['height']) instead. here is the example output of what omniparser does: |
|
@latishab |
|
I tested your code and I had two issues:
Please make these changes, and possibly provide your API outputs (you can put example raw outputs of OmniParser for testing so you don't have to rely on mock data anymore). |
Summary
This PR replaces the rule-based
HeuristicEvaluationEnginewith an LLM-driven evaluation pipeline and updates theUIElementmodel to match real OmniParser output. The system now evaluates H1–H5 using GPT-4 with structured, measurable criteria.Key Changes
1. OmniParser Alignment (Commit: 7255ae1)
Updated
UIElementto align with the actual OmniParser format:type,bbox: [x1, y1, x2, y2],interactivity,content.hover_state,confirmation, etc.).width,height.textalias for backward compatibility.from_dict()supporting both old and new formats.infer_heading_level()for text hierarchy calculation.2. Measurable Criteria for H4 & H5 (Commit: 8957369)
Added concrete, structured criteria for LLM evaluation of higher-level heuristics:
3. LLM-Based Evaluation Engine (Commit: ec31f70)
The core evaluation logic is now LLM-driven:
_serialize_elements_for_llm(),_evaluate_with_llm(), and_llm_explain_heuristic().evaluate_heuristic()is now fully LLM-based, supporting H1-H5.llm_explanationevaluation_version="2.0.0-llm"evaluation_method="llm-based"4. Other Fixes & Integrations
OPENAI_BASE_URLand improved LLM client initialization.Impact Summary
Validation
UIElementserialization confirmed to match real OmniParser output.