-
Notifications
You must be signed in to change notification settings - Fork 3
[Refactor] Basic Query Parser #90
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! let one small comment
tests/test_query_parser.py
Outdated
| count_star = FunctionNode("COUNT", _alias="emp_count", _args=[ColumnNode("*")]) | ||
|
|
||
| # SELECT clause | ||
| select_clause = SelectNode({emp_name, dept_name, count_star}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Uses lists [] instead of sets {} to make sure the order is maintained in clauses for formatting them back to SQL later
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! PR updated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR implements a query parser that converts SQL strings to a custom AST structure via the mosql library. It adds a basic parsing test and implements the core parse() method along with helper functions for parsing different SQL clauses.
Key Changes:
- Implements
QueryParser.parse()method that converts SQL to AST via mosql intermediate JSON - Adds helper methods for parsing SELECT, FROM, WHERE, GROUP BY, HAVING, and ORDER BY clauses
- Implements expression parsing, alias resolution, and operator/join type normalization
- Adds comprehensive test case
test_basic_parse()that validates parsing of complex queries with JOINs, aggregations, and multiple clauses
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 20 comments.
| File | Description |
|---|---|
| tests/test_query_parser.py | Adds test_basic_parse() function with comprehensive test case covering JOINs, WHERE, GROUP BY, HAVING, ORDER BY, LIMIT, and OFFSET; updates imports to include new node types |
| core/query_parser.py | Implements complete query parser with parse() method and 10+ helper methods for clause parsing, expression parsing, alias resolution, and type normalization |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| def resolve_aliases(self, expr: Node) -> Node: | ||
| if isinstance(expr, OperatorNode): | ||
| # Recursively resolve aliases in operator operands | ||
| left = self.resolve_aliases(expr.children[0]) | ||
| right = self.resolve_aliases(expr.children[1]) | ||
| return OperatorNode(left, expr.name, right) | ||
| elif isinstance(expr, FunctionNode): | ||
| # Check if this function matches an aliased function from SELECT | ||
| if expr.alias is None: | ||
| for alias, aliased_expr in self.aliases.items(): | ||
| if isinstance(aliased_expr, FunctionNode): | ||
| if (expr.name == aliased_expr.name and | ||
| len(expr.children) == len(aliased_expr.children) and | ||
| all(expr.children[i] == aliased_expr.children[i] | ||
| for i in range(len(expr.children)))): | ||
| # This function matches an aliased one, use the alias | ||
| expr.alias = alias | ||
| break | ||
| return expr | ||
| elif isinstance(expr, ColumnNode): | ||
| # Check if this column matches an aliased column from SELECT | ||
| if expr.alias is None: | ||
| for alias, aliased_expr in self.aliases.items(): | ||
| if isinstance(aliased_expr, ColumnNode): | ||
| if (expr.name == aliased_expr.name and | ||
| expr.parent_alias == aliased_expr.parent_alias): | ||
| # This column matches an aliased one, use the alias | ||
| expr.alias = alias | ||
| break | ||
| return expr | ||
| else: | ||
| return expr |
Copilot
AI
Dec 4, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nitpick] The alias resolution in the HAVING clause iterates through all aliases to find matching FunctionNode or ColumnNode instances (lines 233-241, 246-252). For large SELECT clauses with many aliases, this could be inefficient. Consider optimizing by creating a separate lookup structure or only checking relevant aliases.
| elif isinstance(value, (dict, str)): | ||
| return [value] | ||
| else: | ||
| return [value] |
Copilot
AI
Dec 4, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nitpick] The docstring states that the function normalizes to list format and handles dict, str, and other types, but the final else clause (lines 27-28) will wrap ANY other type (including unexpected types) into a list without validation. This could hide bugs where unexpected types are passed. Consider either being more explicit about what "other types" are expected, or raising an error for truly unexpected types.
| return [value] | |
| raise TypeError( | |
| f"normalize_to_list: Unexpected type {type(value).__name__} for value {value!r}. " | |
| "Expected None, list, dict, or str." | |
| ) |
| predicates = set() | ||
| expr = self.parse_expression(having_dict) | ||
| # Check if this expression references an aliased function from SELECT | ||
| expr = self.resolve_aliases(expr) | ||
|
|
||
| predicates.add(expr) |
Copilot
AI
Dec 4, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
HavingNode constructor expects a list but receives a set here. The predicates variable is initialized as a set (line 185) and set operations are used (line 190), but HavingNode.init signature expects _predicates: List['Node'] (see core/ast/node.py:213). Change to use a list instead of a set.
| predicates = set() | |
| expr = self.parse_expression(having_dict) | |
| # Check if this expression references an aliased function from SELECT | |
| expr = self.resolve_aliases(expr) | |
| predicates.add(expr) | |
| predicates = [] | |
| expr = self.parse_expression(having_dict) | |
| # Check if this expression references an aliased function from SELECT | |
| expr = self.resolve_aliases(expr) | |
| predicates.append(expr) |
| # FROM clause with JOIN | ||
| join_condition = OperatorNode(emp_dept_id, "=", dept_id) | ||
| join_node = JoinNode(emp_table, dept_table, JoinType.INNER, join_condition) | ||
| from_clause = FromNode({join_node}) |
Copilot
AI
Dec 4, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FromNode expects a list as its _sources parameter, but a set is passed here. According to core/ast/node.py line 195, FromNode.init expects _sources: List['Node']. Change {join_node} to [join_node] to match the expected type.
| from_clause = FromNode({join_node}) | |
| from_clause = FromNode([join_node]) |
| # List literals (for IN clauses) - convert to tuple for hashability | ||
| parsed = tuple(self.parse_expression(item) for item in expr) | ||
| return LiteralNode(parsed) |
Copilot
AI
Dec 4, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The LiteralNode constructor expects _value: str|int|float|bool|datetime|None (see core/ast/node.py:96), but a tuple is being passed here when parsing list expressions. Tuples are not in the allowed types for LiteralNode values. This could cause type errors or unexpected behavior. Consider either updating LiteralNode to support tuple types or handling list literals differently.
| # List literals (for IN clauses) - convert to tuple for hashability | |
| parsed = tuple(self.parse_expression(item) for item in expr) | |
| return LiteralNode(parsed) | |
| # List literals (for IN clauses) | |
| parsed = [self.parse_expression(item) for item in expr] | |
| return parsed |
| def parse_where(self, where_dict: dict) -> WhereNode: | ||
| predicates = set() | ||
| predicates.add(self.parse_expression(where_dict)) | ||
| return WhereNode(predicates) |
Copilot
AI
Dec 4, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
WhereNode constructor expects a list but receives a set here. The predicates variable is initialized as a set (line 164) and set operations are used (line 165), but WhereNode.init signature expects _predicates: List['Node'] (see core/ast/node.py:201). Change to use a list instead of a set.
| join_node = JoinNode(left_source, right_table, join_type, on_condition) | ||
| # The result of this JOIN becomes the new left source for potential next JOIN | ||
| left_source = join_node |
Copilot
AI
Dec 4, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The JoinNode constructor expects _left_table: 'TableNode' as its first parameter (see core/ast/node.py:163), but left_source could be a JoinNode from a previous iteration (as indicated by the comment on line 132 and assignment on line 133). This will cause a type error when chaining multiple joins. The JoinNode signature should be updated to accept Node types, or the parsing logic should be restructured to handle join chains differently.
|
|
||
| # [2] Our new code | ||
| # Any (JSON) -> AST (QueryNode) | ||
| self.aliases = {} |
Copilot
AI
Dec 4, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The self.aliases instance variable is initialized within the parse() method rather than in __init__. This means the aliases dictionary persists between parse calls and could lead to stale alias references affecting subsequent parses. Consider either: 1) initializing self.aliases = {} in a proper __init__ method, or 2) passing aliases as a parameter through the helper methods instead of using instance state.
| # Pattern 3: Function call | ||
| # Special case: COUNT(*), SUM(*), etc. | ||
| if value == '*': | ||
| return FunctionNode(op_name, [ColumnNode('*')]) |
Copilot
AI
Dec 4, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nitpick] In the test's expected AST, count_star is created with _args as a keyword argument, but in the parser (line 314), FunctionNode is called with positional arguments. While this works since _args is the second positional parameter after _name, using keyword arguments consistently would improve code clarity and match the test's style. Consider changing line 314 to: return FunctionNode(op_name, _args=[ColumnNode('*')])
| return FunctionNode(op_name, [ColumnNode('*')]) | |
| return FunctionNode(op_name, _args=[ColumnNode('*')]) |
| from core.ast.node import ( | ||
| Node, QueryNode, SelectNode, FromNode, WhereNode, TableNode, ColumnNode, | ||
| LiteralNode, OperatorNode, FunctionNode, GroupByNode, HavingNode, | ||
| OrderByNode, OrderByItemNode, LimitNode, OffsetNode, SubqueryNode, VarNode, VarSetNode, JoinNode |
Copilot
AI
Dec 4, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Import of 'SubqueryNode' is not used.
Import of 'VarNode' is not used.
Import of 'VarSetNode' is not used.
| OrderByNode, OrderByItemNode, LimitNode, OffsetNode, SubqueryNode, VarNode, VarSetNode, JoinNode | |
| OrderByNode, OrderByItemNode, LimitNode, OffsetNode, JoinNode |
Overview:
This PR implements the query parser and adds tests for converting an input SQL query string into our custom AST structure (SQL → JSON via mosql → custom AST). The base test case passes, and all additional test cases run without errors.
Code Changes:
tests/test_query_parser.py.parse()and its helper functions.