Skip to content

feat: add timestamp normalization for BigQuery artifacts#977

Open
niccoloalexander wants to merge 1 commit intoelementary-data:masterfrom
niccoloalexander:feat/bigquery-adaptability-improvements
Open

feat: add timestamp normalization for BigQuery artifacts#977
niccoloalexander wants to merge 1 commit intoelementary-data:masterfrom
niccoloalexander:feat/bigquery-adaptability-improvements

Conversation

@niccoloalexander
Copy link
Copy Markdown

@niccoloalexander niccoloalexander commented Mar 31, 2026

Issues:

  • Problem parsing hyphenated BigQuery dataset names e.g. "project-name" when using dbt Fusion
  • dbt artifact timestamps that elementary writes to BigQuery and parses as timestamps have incompatible nanosecond-level granularity

Changes:

  • Introduced a new macro normalize_artifact_timestamp_precision to ensure timestamp precision for BigQuery.
  • Updated existing macros to utilize this new function for execute_started_at, execute_completed_at, compile_started_at, and compile_completed_at fields in upload_run_results.sql and upload_source_freshness.sql.
  • Enhanced schema existence checks for BigQuery in create_elementary_tests_schema.sql and get_elementary_tests_schema.sql to improve compatibility.

Summary by CodeRabbit

Release Notes

  • Bug Fixes
    • Improved timestamp handling in data artifact processing to ensure consistent precision across different data warehouse platforms
    • Enhanced schema validation logic to better support BigQuery compatibility

- Introduced a new macro `normalize_artifact_timestamp_precision` to ensure timestamp precision for BigQuery.
- Updated existing macros to utilize this new function for `execute_started_at`, `execute_completed_at`, `compile_started_at`, and `compile_completed_at` fields in `upload_run_results.sql` and `upload_source_freshness.sql`.
- Enhanced schema existence checks for BigQuery in `create_elementary_tests_schema.sql` and `get_elementary_tests_schema.sql` to improve compatibility.
@github-actions
Copy link
Copy Markdown
Contributor

👋 @niccoloalexander
Thank you for raising your pull request.
Please make sure to add tests and document all user-facing changes.
You can do this by editing the docs files in the elementary repository.

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Mar 31, 2026

📝 Walkthrough

Walkthrough

These changes introduce timestamp precision normalization for BigQuery artifacts and replace adapter-based schema existence checks with explicit INFORMATION_SCHEMA queries for BigQuery targets across four macros in the EDR and test utilities modules.

Changes

Cohort / File(s) Summary
Timestamp Precision Normalization
macros/edr/dbt_artifacts/upload_run_results.sql, macros/edr/dbt_artifacts/upload_source_freshness.sql
Added new normalize_artifact_timestamp_precision() macro to truncate fractional seconds to 6 digits for BigQuery; updated flatten_run_result and flatten_source_freshness to apply this normalization to all timing fields (execute_started_at, execute_completed_at, compile_started_at, compile_completed_at).
BigQuery Schema Existence Checks
macros/edr/tests/on_run_start/create_elementary_tests_schema.sql, macros/edr/tests/test_utils/get_elementary_tests_schema.sql
Refactored schema existence checks to branch on target.type: BigQuery now uses explicit INFORMATION_SCHEMA.SCHEMATA queries with case-insensitive schema name matching; other targets retain original adapter.check_schema_exists behavior.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Poem

Hop-hop, the timestamps now align,
Six fractious digits, oh so fine!
BigQuery queries seek and find,
Schemas checked with schema-mind! 🐰✨

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The PR title accurately summarizes the main change: adding timestamp normalization for BigQuery artifacts, which is the primary feature introduced across multiple macros.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (1)
macros/edr/tests/on_run_start/create_elementary_tests_schema.sql (1)

9-23: Consider extracting this BigQuery schema-exists check into a shared macro.

This logic is duplicated in macros/edr/tests/test_utils/get_elementary_tests_schema.sql (same SQL + result parsing pattern). A shared helper would keep behavior consistent and reduce drift.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@macros/edr/tests/on_run_start/create_elementary_tests_schema.sql` around
lines 9 - 23, Extract the BigQuery schema-exists logic (the block that builds
schema_exists_sql, calls elementary.run_query into schema_exists_result, and
computes schema_exists) into a shared macro (e.g.,
get_elementary_tests_schema_exists) and use that macro from both
create_elementary_tests_schema.sql and get_elementary_tests_schema.sql; the
macro should accept database_name and tests_schema_name, run the same
INFORMATION_SCHEMA query via elementary.run_query, parse rows[0][0] to an int
boolean, and return the boolean so you can replace the duplicated
schema_exists_sql/schema_exists_result/schema_exists code with a single macro
call.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@macros/edr/tests/on_run_start/create_elementary_tests_schema.sql`:
- Around line 15-20: The current check reads schema_exists_result.rows[0][0]
directly which breaks in Fusion; replace the direct rows access by converting
the run_query result with elementary.agate_to_dicts(schema_exists_result) and
then inspect the first dict/value to determine existence (update the logic
around schema_exists_result and schema_exists); apply the same change in both
locations referencing elementary.run_query in create_elementary_tests_schema.sql
and get_elementary_tests_schema.sql so the code uses
elementary.agate_to_dicts(...) instead of schema_exists_result.rows[0][0].

In `@macros/edr/tests/test_utils/get_elementary_tests_schema.sql`:
- Around line 24-29: The current legacy_schema_exists computation reads
legacy_schema_exists_result.rows[0][0] which breaks in dbt Fusion because
elementary.run_query() can return a different shape; update the logic that sets
legacy_schema_exists (and the variable legacy_schema_exists_result from
elementary.run_query(legacy_schema_exists_sql)) to normalize the result via
elementary.agate_to_dicts(legacy_schema_exists_result) (or the equivalent
conversion) and then check the first row/value safely via the normalized
dict/list structure so both legacy and Fusion run_query() result shapes are
supported.

---

Nitpick comments:
In `@macros/edr/tests/on_run_start/create_elementary_tests_schema.sql`:
- Around line 9-23: Extract the BigQuery schema-exists logic (the block that
builds schema_exists_sql, calls elementary.run_query into schema_exists_result,
and computes schema_exists) into a shared macro (e.g.,
get_elementary_tests_schema_exists) and use that macro from both
create_elementary_tests_schema.sql and get_elementary_tests_schema.sql; the
macro should accept database_name and tests_schema_name, run the same
INFORMATION_SCHEMA query via elementary.run_query, parse rows[0][0] to an int
boolean, and return the boolean so you can replace the duplicated
schema_exists_sql/schema_exists_result/schema_exists code with a single macro
call.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 99ee226d-2fc5-4e5e-86f8-97a52e265b73

📥 Commits

Reviewing files that changed from the base of the PR and between b8c7ab0 and 7e1be1b.

📒 Files selected for processing (4)
  • macros/edr/dbt_artifacts/upload_run_results.sql
  • macros/edr/dbt_artifacts/upload_source_freshness.sql
  • macros/edr/tests/on_run_start/create_elementary_tests_schema.sql
  • macros/edr/tests/test_utils/get_elementary_tests_schema.sql

Comment on lines +15 to +20
{% set schema_exists_result = elementary.run_query(schema_exists_sql) %}
{% set schema_exists = (
schema_exists_result is not none
and schema_exists_result.rows | length > 0
and schema_exists_result.rows[0][0] | int > 0
) %}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
set -euo pipefail

# Verify direct row-index access patterns that may be incompatible with Fusion result objects.
rg -n --type sql '\.rows\[[0-9]+\]\[[0-9]+\]' macros

# Verify repository's existing Fusion-safe conversion pattern.
rg -n --type sql 'macro agate_to_dicts|is_dbt_fusion' macros/utils/run_queries/agate_to_dicts.sql

Repository: elementary-data/dbt-data-reliability

Length of output: 459


🏁 Script executed:

cat -n macros/utils/run_queries/agate_to_dicts.sql

Repository: elementary-data/dbt-data-reliability

Length of output: 2113


🏁 Script executed:

cat -n macros/edr/tests/on_run_start/create_elementary_tests_schema.sql | sed -n '1,30p'

Repository: elementary-data/dbt-data-reliability

Length of output: 1799


🏁 Script executed:

cat -n macros/edr/tests/test_utils/get_elementary_tests_schema.sql | sed -n '20,35p'

Repository: elementary-data/dbt-data-reliability

Length of output: 1002


🏁 Script executed:

cat -n macros/utils/run_queries/run_query.sql

Repository: elementary-data/dbt-data-reliability

Length of output: 2794


🏁 Script executed:

# Search for other uses of run_query to see if .rows access is common pattern
rg -n 'run_query.*\.rows' macros --type sql | head -20

Repository: elementary-data/dbt-data-reliability

Length of output: 62


🏁 Script executed:

# Search for tests that cover create_elementary_tests_schema
rg -n 'create_elementary_tests_schema|is_dbt_fusion' macros --type sql | grep -E '(test|fusion)'

Repository: elementary-data/dbt-data-reliability

Length of output: 1248


🏁 Script executed:

# Check for any fusion-related tests or integration tests
find . -type f -name '*test*.sql' -o -name '*fusion*.sql' 2>/dev/null | head -20

Repository: elementary-data/dbt-data-reliability

Length of output: 1248


🏁 Script executed:

# Look for any TODOs or comments about fusion compatibility in these files
rg -n -B2 -A2 'fusion|Fusion' macros/edr/tests/on_run_start/create_elementary_tests_schema.sql

Repository: elementary-data/dbt-data-reliability

Length of output: 62


Use Fusion-safe access for run_query results to prevent runtime failures.

Lines 18-19 access .rows[0][0] directly, which fails in dbt Fusion because the result structure differs between Fusion and non-Fusion modes. This breaks schema creation during on-run-start for BigQuery + Fusion.

Replace with elementary.agate_to_dicts(), which handles both modes:

Proposed fix
             {% set schema_exists_result = elementary.run_query(schema_exists_sql) %}
-            {% set schema_exists = (
-                schema_exists_result is not none
-                and schema_exists_result.rows | length > 0
-                and schema_exists_result.rows[0][0] | int > 0
-            ) %}
+            {% set schema_count_rows = [] %}
+            {% if schema_exists_result is not none %}
+                {% set schema_count_rows = elementary.agate_to_dicts(schema_exists_result) %}
+            {% endif %}
+            {% set schema_exists = (
+                schema_count_rows | length > 0
+                and schema_count_rows[0]["schema_count"] | int > 0
+            ) %}

Same fix needed in macros/edr/tests/test_utils/get_elementary_tests_schema.sql (line 28).

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@macros/edr/tests/on_run_start/create_elementary_tests_schema.sql` around
lines 15 - 20, The current check reads schema_exists_result.rows[0][0] directly
which breaks in Fusion; replace the direct rows access by converting the
run_query result with elementary.agate_to_dicts(schema_exists_result) and then
inspect the first dict/value to determine existence (update the logic around
schema_exists_result and schema_exists); apply the same change in both locations
referencing elementary.run_query in create_elementary_tests_schema.sql and
get_elementary_tests_schema.sql so the code uses elementary.agate_to_dicts(...)
instead of schema_exists_result.rows[0][0].

Comment on lines +24 to +29
{% set legacy_schema_exists_result = elementary.run_query(legacy_schema_exists_sql) %}
{% set legacy_schema_exists = (
legacy_schema_exists_result is not none
and legacy_schema_exists_result.rows | length > 0
and legacy_schema_exists_result.rows[0][0] | int > 0
) %}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

# Check the file and understand the context
cat -n macros/edr/tests/test_utils/get_elementary_tests_schema.sql | head -40

Repository: elementary-data/dbt-data-reliability

Length of output: 2221


🏁 Script executed:

# Find the agate_to_dict(s) macro
find . -name "*.sql" -type f | xargs grep -l "agate_to_dict" | head -10

Repository: elementary-data/dbt-data-reliability

Length of output: 672


🏁 Script executed:

# Check if the macro is agate_to_dict or agate_to_dicts
rg "macro.*agate_to_dict" --type sql

Repository: elementary-data/dbt-data-reliability

Length of output: 163


🏁 Script executed:

# Examine the agate_to_dicts macro implementation
cat -n macros/utils/run_queries/agate_to_dicts.sql

Repository: elementary-data/dbt-data-reliability

Length of output: 2113


🏁 Script executed:

# Check for other usages of agate_to_dicts in the codebase to understand the pattern
rg "agate_to_dicts" --type sql -B2 -A2 | head -50

Repository: elementary-data/dbt-data-reliability

Length of output: 4307


legacy_schema_exists parsing is not Fusion-safe.

Lines 27–28 access .rows[0][0] directly. In dbt Fusion, run_query() returns a different result shape without a .rows attribute, breaking backward-compatibility schema detection.

Use elementary.agate_to_dicts() (as shown below) to handle both modes:

Proposed fix
             {% set legacy_schema_exists_result = elementary.run_query(legacy_schema_exists_sql) %}
-            {% set legacy_schema_exists = (
-                legacy_schema_exists_result is not none
-                and legacy_schema_exists_result.rows | length > 0
-                and legacy_schema_exists_result.rows[0][0] | int > 0
-            ) %}
+            {% set legacy_schema_count_rows = [] %}
+            {% if legacy_schema_exists_result is not none %}
+                {% set legacy_schema_count_rows = elementary.agate_to_dicts(legacy_schema_exists_result) %}
+            {% endif %}
+            {% set legacy_schema_exists = (
+                legacy_schema_count_rows | length > 0
+                and legacy_schema_count_rows[0]["schema_count"] | int > 0
+            ) %}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@macros/edr/tests/test_utils/get_elementary_tests_schema.sql` around lines 24
- 29, The current legacy_schema_exists computation reads
legacy_schema_exists_result.rows[0][0] which breaks in dbt Fusion because
elementary.run_query() can return a different shape; update the logic that sets
legacy_schema_exists (and the variable legacy_schema_exists_result from
elementary.run_query(legacy_schema_exists_sql)) to normalize the result via
elementary.agate_to_dicts(legacy_schema_exists_result) (or the equivalent
conversion) and then check the first row/value safely via the normalized
dict/list structure so both legacy and Fusion run_query() result shapes are
supported.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant