[Refactor] Basic Query Parser #90

HazelYuAhiru · 2025-11-21T20:05:45Z

Overview:

This PR implements the query parser and adds tests for converting an input SQL query string into our custom AST structure (SQL → JSON via mosql → custom AST). The base test case passes, and all additional test cases run without errors.

Code Changes:

Added a base parsing test in tests/test_query_parser.py.
Implemented parse() and its helper functions.

colinthebomb1

LGTM! let one small comment

colinthebomb1 · 2025-12-01T20:15:28Z

tests/test_query_parser.py

+    count_star = FunctionNode("COUNT", _alias="emp_count", _args=[ColumnNode("*")])
+
+    # SELECT clause
+    select_clause = SelectNode({emp_name, dept_name, count_star})


Uses lists [] instead of sets {} to make sure the order is maintained in clauses for formatting them back to SQL later

Thanks! PR updated

Copilot

Pull request overview

This PR implements a query parser that converts SQL strings to a custom AST structure via the mosql library. It adds a basic parsing test and implements the core parse() method along with helper functions for parsing different SQL clauses.

Key Changes:

Implements QueryParser.parse() method that converts SQL to AST via mosql intermediate JSON
Adds helper methods for parsing SELECT, FROM, WHERE, GROUP BY, HAVING, and ORDER BY clauses
Implements expression parsing, alias resolution, and operator/join type normalization
Adds comprehensive test case test_basic_parse() that validates parsing of complex queries with JOINs, aggregations, and multiple clauses

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 20 comments.

File	Description
tests/test_query_parser.py	Adds test_basic_parse() function with comprehensive test case covering JOINs, WHERE, GROUP BY, HAVING, ORDER BY, LIMIT, and OFFSET; updates imports to include new node types
core/query_parser.py	Implements complete query parser with parse() method and 10+ helper methods for clause parsing, expression parsing, alias resolution, and type normalization

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2025-12-04T00:08:50Z

core/query_parser.py

+    def resolve_aliases(self, expr: Node) -> Node:
+        if isinstance(expr, OperatorNode):
+            # Recursively resolve aliases in operator operands
+            left = self.resolve_aliases(expr.children[0])
+            right = self.resolve_aliases(expr.children[1])
+            return OperatorNode(left, expr.name, right)
+        elif isinstance(expr, FunctionNode):
+            # Check if this function matches an aliased function from SELECT
+            if expr.alias is None:
+                for alias, aliased_expr in self.aliases.items():
+                    if isinstance(aliased_expr, FunctionNode):
+                        if (expr.name == aliased_expr.name and 
+                            len(expr.children) == len(aliased_expr.children) and
+                            all(expr.children[i] == aliased_expr.children[i] 
+                                for i in range(len(expr.children)))):
+                            # This function matches an aliased one, use the alias
+                            expr.alias = alias
+                            break
+            return expr
+        elif isinstance(expr, ColumnNode):
+            # Check if this column matches an aliased column from SELECT
+            if expr.alias is None:
+                for alias, aliased_expr in self.aliases.items():
+                    if isinstance(aliased_expr, ColumnNode):
+                        if (expr.name == aliased_expr.name and 
+                            expr.parent_alias == aliased_expr.parent_alias):
+                            # This column matches an aliased one, use the alias
+                            expr.alias = alias
+                            break
+            return expr
+        else:
+            return expr


[nitpick] The alias resolution in the HAVING clause iterates through all aliases to find matching FunctionNode or ColumnNode instances (lines 233-241, 246-252). For large SELECT clauses with many aliases, this could be inefficient. Consider optimizing by creating a separate lookup structure or only checking relevant aliases.

Copilot · 2025-12-04T00:08:50Z

core/query_parser.py

+        elif isinstance(value, (dict, str)):
+            return [value]
+        else:
+            return [value]


[nitpick] The docstring states that the function normalizes to list format and handles dict, str, and other types, but the final else clause (lines 27-28) will wrap ANY other type (including unexpected types) into a list without validation. This could hide bugs where unexpected types are passed. Consider either being more explicit about what "other types" are expected, or raising an error for truly unexpected types.

Suggested change

return [value]

raise TypeError(

f"normalize_to_list: Unexpected type {type(value).__name__} for value {value!r}. "

"Expected None, list, dict, or str."

)

Copilot · 2025-12-04T00:08:51Z

core/query_parser.py

+        predicates = set()
+        expr = self.parse_expression(having_dict)
+        # Check if this expression references an aliased function from SELECT
+        expr = self.resolve_aliases(expr)
+
+        predicates.add(expr)


HavingNode constructor expects a list but receives a set here. The predicates variable is initialized as a set (line 185) and set operations are used (line 190), but HavingNode.init signature expects _predicates: List['Node'] (see core/ast/node.py:213). Change to use a list instead of a set.

Suggested change

predicates = set()

expr = self.parse_expression(having_dict)

# Check if this expression references an aliased function from SELECT

expr = self.resolve_aliases(expr)

predicates.add(expr)

predicates = []

expr = self.parse_expression(having_dict)

# Check if this expression references an aliased function from SELECT

expr = self.resolve_aliases(expr)

predicates.append(expr)

Copilot · 2025-12-04T00:08:51Z

tests/test_query_parser.py

+    # FROM clause with JOIN
+    join_condition = OperatorNode(emp_dept_id, "=", dept_id)
+    join_node = JoinNode(emp_table, dept_table, JoinType.INNER, join_condition)
+    from_clause = FromNode({join_node})


FromNode expects a list as its _sources parameter, but a set is passed here. According to core/ast/node.py line 195, FromNode.init expects _sources: List['Node']. Change {join_node} to [join_node] to match the expected type.

Suggested change

from_clause = FromNode({join_node})

from_clause = FromNode([join_node])

Copilot · 2025-12-04T00:08:51Z

core/query_parser.py

+            # List literals (for IN clauses) - convert to tuple for hashability
+            parsed = tuple(self.parse_expression(item) for item in expr)
+            return LiteralNode(parsed)


The LiteralNode constructor expects _value: str|int|float|bool|datetime|None (see core/ast/node.py:96), but a tuple is being passed here when parsing list expressions. Tuples are not in the allowed types for LiteralNode values. This could cause type errors or unexpected behavior. Consider either updating LiteralNode to support tuple types or handling list literals differently.

Suggested change

# List literals (for IN clauses) - convert to tuple for hashability

parsed = tuple(self.parse_expression(item) for item in expr)

return LiteralNode(parsed)

# List literals (for IN clauses)

parsed = [self.parse_expression(item) for item in expr]

return parsed

Copilot · 2025-12-04T00:08:54Z

core/query_parser.py

+    def parse_where(self, where_dict: dict) -> WhereNode:
+        predicates = set()
+        predicates.add(self.parse_expression(where_dict))
+        return WhereNode(predicates)


WhereNode constructor expects a list but receives a set here. The predicates variable is initialized as a set (line 164) and set operations are used (line 165), but WhereNode.init signature expects _predicates: List['Node'] (see core/ast/node.py:201). Change to use a list instead of a set.

Copilot · 2025-12-04T00:08:55Z

core/query_parser.py

+                    join_node = JoinNode(left_source, right_table, join_type, on_condition)
+                    # The result of this JOIN becomes the new left source for potential next JOIN
+                    left_source = join_node


The JoinNode constructor expects _left_table: 'TableNode' as its first parameter (see core/ast/node.py:163), but left_source could be a JoinNode from a previous iteration (as indicated by the comment on line 132 and assignment on line 133). This will cause a type error when chaining multiple joins. The JoinNode signature should be updated to accept Node types, or the parsing logic should be restructured to handle join chains differently.

Copilot · 2025-12-04T00:08:55Z

core/query_parser.py


        # [2] Our new code
        # Any (JSON) -> AST (QueryNode)
+        self.aliases = {}


The self.aliases instance variable is initialized within the parse() method rather than in __init__. This means the aliases dictionary persists between parse calls and could lead to stale alias references affecting subsequent parses. Consider either: 1) initializing self.aliases = {} in a proper __init__ method, or 2) passing aliases as a parameter through the helper methods instead of using instance state.

Copilot · 2025-12-04T00:08:55Z

core/query_parser.py

+                # Pattern 3: Function call
+                # Special case: COUNT(*), SUM(*), etc.
+                if value == '*':
+                    return FunctionNode(op_name, [ColumnNode('*')])


[nitpick] In the test's expected AST, count_star is created with _args as a keyword argument, but in the parser (line 314), FunctionNode is called with positional arguments. While this works since _args is the second positional parameter after _name, using keyword arguments consistently would improve code clarity and match the test's style. Consider changing line 314 to: return FunctionNode(op_name, _args=[ColumnNode('*')])

Suggested change

return FunctionNode(op_name, [ColumnNode('*')])

return FunctionNode(op_name, _args=[ColumnNode('*')])

Copilot · 2025-12-04T00:08:55Z

core/query_parser.py

+from core.ast.node import (
+    Node, QueryNode, SelectNode, FromNode, WhereNode, TableNode, ColumnNode, 
+    LiteralNode, OperatorNode, FunctionNode, GroupByNode, HavingNode,
+    OrderByNode, OrderByItemNode, LimitNode, OffsetNode, SubqueryNode, VarNode, VarSetNode, JoinNode


Import of 'SubqueryNode' is not used.
Import of 'VarNode' is not used.
Import of 'VarSetNode' is not used.

Suggested change

OrderByNode, OrderByItemNode, LimitNode, OffsetNode, SubqueryNode, VarNode, VarSetNode, JoinNode

OrderByNode, OrderByItemNode, LimitNode, OffsetNode, JoinNode

HazelYuAhiru added 2 commits November 20, 2025 17:34

parser complete

43bf081

clean up

74ba863

HazelYuAhiru requested a review from colinthebomb1 November 21, 2025 20:05

colinthebomb1 previously approved these changes Dec 1, 2025

View reviewed changes

colinthebomb1 reviewed Dec 1, 2025

View reviewed changes

update select from set to list

fbbbbfc

HazelYuAhiru dismissed colinthebomb1’s stale review via fbbbbfc December 3, 2025 22:17

HazelYuAhiru marked this pull request as ready for review December 3, 2025 22:17

HazelYuAhiru requested a review from baiqiushi December 3, 2025 22:17

baiqiushi requested a review from Copilot December 4, 2025 00:02

Copilot started reviewing on behalf of baiqiushi December 4, 2025 00:03 View session

Copilot finished reviewing on behalf of baiqiushi December 4, 2025 00:06

Copilot AI reviewed Dec 4, 2025

View reviewed changes

-            return [value]
+            raise TypeError(
+                f"normalize_to_list: Unexpected type {type(value).__name__} for value {value!r}. "
+                "Expected None, list, dict, or str."
+            )

	from_clause = FromNode({join_node})
	from_clause = FromNode([join_node])

	return FunctionNode(op_name, [ColumnNode('*')])
	return FunctionNode(op_name, _args=[ColumnNode('*')])

	OrderByNode, OrderByItemNode, LimitNode, OffsetNode, SubqueryNode, VarNode, VarSetNode, JoinNode
	OrderByNode, OrderByItemNode, LimitNode, OffsetNode, JoinNode

[Refactor] Basic Query Parser #90

Are you sure you want to change the base?

[Refactor] Basic Query Parser #90

Uh oh!

Conversation

HazelYuAhiru commented Nov 21, 2025

Overview:

Code Changes:

Uh oh!

colinthebomb1 left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

colinthebomb1 Dec 1, 2025

Choose a reason for hiding this comment

Uh oh!

HazelYuAhiru Dec 3, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Dec 4, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 4, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 4, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 4, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 4, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 4, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 4, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 4, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 4, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 4, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

colinthebomb1 left a comment •

edited

Loading