feat: add timestamp normalization for BigQuery artifacts#977
feat: add timestamp normalization for BigQuery artifacts#977niccoloalexander wants to merge 1 commit intoelementary-data:masterfrom
Conversation
- Introduced a new macro `normalize_artifact_timestamp_precision` to ensure timestamp precision for BigQuery. - Updated existing macros to utilize this new function for `execute_started_at`, `execute_completed_at`, `compile_started_at`, and `compile_completed_at` fields in `upload_run_results.sql` and `upload_source_freshness.sql`. - Enhanced schema existence checks for BigQuery in `create_elementary_tests_schema.sql` and `get_elementary_tests_schema.sql` to improve compatibility.
|
👋 @niccoloalexander |
📝 WalkthroughWalkthroughThese changes introduce timestamp precision normalization for BigQuery artifacts and replace adapter-based schema existence checks with explicit Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Poem
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (1)
macros/edr/tests/on_run_start/create_elementary_tests_schema.sql (1)
9-23: Consider extracting this BigQuery schema-exists check into a shared macro.This logic is duplicated in
macros/edr/tests/test_utils/get_elementary_tests_schema.sql(same SQL + result parsing pattern). A shared helper would keep behavior consistent and reduce drift.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@macros/edr/tests/on_run_start/create_elementary_tests_schema.sql` around lines 9 - 23, Extract the BigQuery schema-exists logic (the block that builds schema_exists_sql, calls elementary.run_query into schema_exists_result, and computes schema_exists) into a shared macro (e.g., get_elementary_tests_schema_exists) and use that macro from both create_elementary_tests_schema.sql and get_elementary_tests_schema.sql; the macro should accept database_name and tests_schema_name, run the same INFORMATION_SCHEMA query via elementary.run_query, parse rows[0][0] to an int boolean, and return the boolean so you can replace the duplicated schema_exists_sql/schema_exists_result/schema_exists code with a single macro call.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@macros/edr/tests/on_run_start/create_elementary_tests_schema.sql`:
- Around line 15-20: The current check reads schema_exists_result.rows[0][0]
directly which breaks in Fusion; replace the direct rows access by converting
the run_query result with elementary.agate_to_dicts(schema_exists_result) and
then inspect the first dict/value to determine existence (update the logic
around schema_exists_result and schema_exists); apply the same change in both
locations referencing elementary.run_query in create_elementary_tests_schema.sql
and get_elementary_tests_schema.sql so the code uses
elementary.agate_to_dicts(...) instead of schema_exists_result.rows[0][0].
In `@macros/edr/tests/test_utils/get_elementary_tests_schema.sql`:
- Around line 24-29: The current legacy_schema_exists computation reads
legacy_schema_exists_result.rows[0][0] which breaks in dbt Fusion because
elementary.run_query() can return a different shape; update the logic that sets
legacy_schema_exists (and the variable legacy_schema_exists_result from
elementary.run_query(legacy_schema_exists_sql)) to normalize the result via
elementary.agate_to_dicts(legacy_schema_exists_result) (or the equivalent
conversion) and then check the first row/value safely via the normalized
dict/list structure so both legacy and Fusion run_query() result shapes are
supported.
---
Nitpick comments:
In `@macros/edr/tests/on_run_start/create_elementary_tests_schema.sql`:
- Around line 9-23: Extract the BigQuery schema-exists logic (the block that
builds schema_exists_sql, calls elementary.run_query into schema_exists_result,
and computes schema_exists) into a shared macro (e.g.,
get_elementary_tests_schema_exists) and use that macro from both
create_elementary_tests_schema.sql and get_elementary_tests_schema.sql; the
macro should accept database_name and tests_schema_name, run the same
INFORMATION_SCHEMA query via elementary.run_query, parse rows[0][0] to an int
boolean, and return the boolean so you can replace the duplicated
schema_exists_sql/schema_exists_result/schema_exists code with a single macro
call.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 99ee226d-2fc5-4e5e-86f8-97a52e265b73
📒 Files selected for processing (4)
macros/edr/dbt_artifacts/upload_run_results.sqlmacros/edr/dbt_artifacts/upload_source_freshness.sqlmacros/edr/tests/on_run_start/create_elementary_tests_schema.sqlmacros/edr/tests/test_utils/get_elementary_tests_schema.sql
| {% set schema_exists_result = elementary.run_query(schema_exists_sql) %} | ||
| {% set schema_exists = ( | ||
| schema_exists_result is not none | ||
| and schema_exists_result.rows | length > 0 | ||
| and schema_exists_result.rows[0][0] | int > 0 | ||
| ) %} |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# Verify direct row-index access patterns that may be incompatible with Fusion result objects.
rg -n --type sql '\.rows\[[0-9]+\]\[[0-9]+\]' macros
# Verify repository's existing Fusion-safe conversion pattern.
rg -n --type sql 'macro agate_to_dicts|is_dbt_fusion' macros/utils/run_queries/agate_to_dicts.sqlRepository: elementary-data/dbt-data-reliability
Length of output: 459
🏁 Script executed:
cat -n macros/utils/run_queries/agate_to_dicts.sqlRepository: elementary-data/dbt-data-reliability
Length of output: 2113
🏁 Script executed:
cat -n macros/edr/tests/on_run_start/create_elementary_tests_schema.sql | sed -n '1,30p'Repository: elementary-data/dbt-data-reliability
Length of output: 1799
🏁 Script executed:
cat -n macros/edr/tests/test_utils/get_elementary_tests_schema.sql | sed -n '20,35p'Repository: elementary-data/dbt-data-reliability
Length of output: 1002
🏁 Script executed:
cat -n macros/utils/run_queries/run_query.sqlRepository: elementary-data/dbt-data-reliability
Length of output: 2794
🏁 Script executed:
# Search for other uses of run_query to see if .rows access is common pattern
rg -n 'run_query.*\.rows' macros --type sql | head -20Repository: elementary-data/dbt-data-reliability
Length of output: 62
🏁 Script executed:
# Search for tests that cover create_elementary_tests_schema
rg -n 'create_elementary_tests_schema|is_dbt_fusion' macros --type sql | grep -E '(test|fusion)'Repository: elementary-data/dbt-data-reliability
Length of output: 1248
🏁 Script executed:
# Check for any fusion-related tests or integration tests
find . -type f -name '*test*.sql' -o -name '*fusion*.sql' 2>/dev/null | head -20Repository: elementary-data/dbt-data-reliability
Length of output: 1248
🏁 Script executed:
# Look for any TODOs or comments about fusion compatibility in these files
rg -n -B2 -A2 'fusion|Fusion' macros/edr/tests/on_run_start/create_elementary_tests_schema.sqlRepository: elementary-data/dbt-data-reliability
Length of output: 62
Use Fusion-safe access for run_query results to prevent runtime failures.
Lines 18-19 access .rows[0][0] directly, which fails in dbt Fusion because the result structure differs between Fusion and non-Fusion modes. This breaks schema creation during on-run-start for BigQuery + Fusion.
Replace with elementary.agate_to_dicts(), which handles both modes:
Proposed fix
{% set schema_exists_result = elementary.run_query(schema_exists_sql) %}
- {% set schema_exists = (
- schema_exists_result is not none
- and schema_exists_result.rows | length > 0
- and schema_exists_result.rows[0][0] | int > 0
- ) %}
+ {% set schema_count_rows = [] %}
+ {% if schema_exists_result is not none %}
+ {% set schema_count_rows = elementary.agate_to_dicts(schema_exists_result) %}
+ {% endif %}
+ {% set schema_exists = (
+ schema_count_rows | length > 0
+ and schema_count_rows[0]["schema_count"] | int > 0
+ ) %}Same fix needed in macros/edr/tests/test_utils/get_elementary_tests_schema.sql (line 28).
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@macros/edr/tests/on_run_start/create_elementary_tests_schema.sql` around
lines 15 - 20, The current check reads schema_exists_result.rows[0][0] directly
which breaks in Fusion; replace the direct rows access by converting the
run_query result with elementary.agate_to_dicts(schema_exists_result) and then
inspect the first dict/value to determine existence (update the logic around
schema_exists_result and schema_exists); apply the same change in both locations
referencing elementary.run_query in create_elementary_tests_schema.sql and
get_elementary_tests_schema.sql so the code uses elementary.agate_to_dicts(...)
instead of schema_exists_result.rows[0][0].
| {% set legacy_schema_exists_result = elementary.run_query(legacy_schema_exists_sql) %} | ||
| {% set legacy_schema_exists = ( | ||
| legacy_schema_exists_result is not none | ||
| and legacy_schema_exists_result.rows | length > 0 | ||
| and legacy_schema_exists_result.rows[0][0] | int > 0 | ||
| ) %} |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
# Check the file and understand the context
cat -n macros/edr/tests/test_utils/get_elementary_tests_schema.sql | head -40Repository: elementary-data/dbt-data-reliability
Length of output: 2221
🏁 Script executed:
# Find the agate_to_dict(s) macro
find . -name "*.sql" -type f | xargs grep -l "agate_to_dict" | head -10Repository: elementary-data/dbt-data-reliability
Length of output: 672
🏁 Script executed:
# Check if the macro is agate_to_dict or agate_to_dicts
rg "macro.*agate_to_dict" --type sqlRepository: elementary-data/dbt-data-reliability
Length of output: 163
🏁 Script executed:
# Examine the agate_to_dicts macro implementation
cat -n macros/utils/run_queries/agate_to_dicts.sqlRepository: elementary-data/dbt-data-reliability
Length of output: 2113
🏁 Script executed:
# Check for other usages of agate_to_dicts in the codebase to understand the pattern
rg "agate_to_dicts" --type sql -B2 -A2 | head -50Repository: elementary-data/dbt-data-reliability
Length of output: 4307
legacy_schema_exists parsing is not Fusion-safe.
Lines 27–28 access .rows[0][0] directly. In dbt Fusion, run_query() returns a different result shape without a .rows attribute, breaking backward-compatibility schema detection.
Use elementary.agate_to_dicts() (as shown below) to handle both modes:
Proposed fix
{% set legacy_schema_exists_result = elementary.run_query(legacy_schema_exists_sql) %}
- {% set legacy_schema_exists = (
- legacy_schema_exists_result is not none
- and legacy_schema_exists_result.rows | length > 0
- and legacy_schema_exists_result.rows[0][0] | int > 0
- ) %}
+ {% set legacy_schema_count_rows = [] %}
+ {% if legacy_schema_exists_result is not none %}
+ {% set legacy_schema_count_rows = elementary.agate_to_dicts(legacy_schema_exists_result) %}
+ {% endif %}
+ {% set legacy_schema_exists = (
+ legacy_schema_count_rows | length > 0
+ and legacy_schema_count_rows[0]["schema_count"] | int > 0
+ ) %}🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@macros/edr/tests/test_utils/get_elementary_tests_schema.sql` around lines 24
- 29, The current legacy_schema_exists computation reads
legacy_schema_exists_result.rows[0][0] which breaks in dbt Fusion because
elementary.run_query() can return a different shape; update the logic that sets
legacy_schema_exists (and the variable legacy_schema_exists_result from
elementary.run_query(legacy_schema_exists_sql)) to normalize the result via
elementary.agate_to_dicts(legacy_schema_exists_result) (or the equivalent
conversion) and then check the first row/value safely via the normalized
dict/list structure so both legacy and Fusion run_query() result shapes are
supported.
Issues:
Changes:
normalize_artifact_timestamp_precisionto ensure timestamp precision for BigQuery.execute_started_at,execute_completed_at,compile_started_at, andcompile_completed_atfields inupload_run_results.sqlandupload_source_freshness.sql.create_elementary_tests_schema.sqlandget_elementary_tests_schema.sqlto improve compatibility.Summary by CodeRabbit
Release Notes