Add mitxonline_course_engagements_daily_report#1906
Conversation
There was a problem hiding this comment.
Pull request overview
This PR adds a new dbt reporting model to materialize a previously-virtual Superset dataset, improving query performance by persisting derived fields for MITx Online daily course engagement.
Changes:
- Adds
mitxonline_course_engagements_daily_reportreporting model built onmarts__mitxonline_course_engagements_daily. - Introduces derived/reporting-friendly columns:
platform,courserun_is_current, andcourse_readable_id. - Documents the new model and adds a compound-uniqueness test in the reporting models YAML.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
src/ol_dbt/models/reporting/mitxonline_course_engagements_daily_report.sql |
New physical reporting model that extends the daily engagement mart with platform/current-run flag and course metadata. |
src/ol_dbt/models/reporting/_reporting__models.yml |
Adds documentation + tests for the new reporting model. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| , mitx_courses.course_readable_id | ||
| from mitxonline_engagements | ||
| left join mitx_courses | ||
| on mitxonline_engagements.course_number = mitx_courses.course_number |
There was a problem hiding this comment.
int__mitx__courses can have multiple rows per course_number (see model docs noting the field may not be unique), so joining it directly on course_number can duplicate engagement rows and break the uniqueness expectation / inflate metrics. Consider deduplicating mitx_courses to 1 row per course_number (e.g., row_number() partitioned by course_number and keep the preferred MITx Online row), or join via a more specific key (e.g., join int__mitxonline__course_runs to get course_id then join on mitxonline_course_id).
There was a problem hiding this comment.
@copilot open a new pull request to apply changes based on this feedback
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| , case | ||
| when cast(substring(mitxonline_engagements.courserun_start_on, 1, 10) as date) <= current_date | ||
| and ( | ||
| mitxonline_engagements.courserun_end_on is null | ||
| or cast(substring(mitxonline_engagements.courserun_end_on, 1, 10) as date) >= current_date | ||
| ) | ||
| then true | ||
| else false | ||
| end as courserun_is_current |
There was a problem hiding this comment.
courserun_is_current is computed by casting the first 10 chars of the ISO8601 timestamps to date and using an inclusive end-date check (>= current_date). Elsewhere (e.g., int__combined__course_runs) the project derives this flag using from_iso8601_timestamp(...) and compares timestamps with an exclusive end boundary (courserun_end_on > current_date). Consider aligning with that approach to avoid off-by-one behavior for runs that end at midnight and to keep “current” semantics consistent across reports.
| ) | ||
|
|
||
| , mitx_courses as ( | ||
| select * from {{ ref('int__mitx__courses') }} |
There was a problem hiding this comment.
The mitx_courses CTE uses select * from int__mitx__courses, but the final query only needs mitxonline_course_id and course_readable_id. Selecting only the required columns will reduce scan/IO and avoid accidental coupling to upstream schema changes.
| select * from {{ ref('int__mitx__courses') }} | |
| select | |
| mitxonline_course_id | |
| , course_readable_id | |
| from {{ ref('int__mitx__courses') }} |
|
@copilot open a new pull request to apply changes based on the comments in this thread |
|
@copilot review this branch with most recent changes |
rachellougee
left a comment
There was a problem hiding this comment.
Looks good. Just a small comment but it's not a blocker
| , case | ||
| when | ||
| mitxonline_engagements.courserun_end_on is null | ||
| and from_iso8601_timestamp(mitxonline_engagements.courserun_start_on) <= current_date | ||
| then true | ||
| when | ||
| from_iso8601_timestamp(mitxonline_engagements.courserun_start_on) <= current_date | ||
| and from_iso8601_timestamp(mitxonline_engagements.courserun_end_on) > current_date | ||
| then true | ||
| else false |
There was a problem hiding this comment.
May be helpful to turn this into a macro so that it can be reused, since the same logic appear in another repor https://github.com/mitodl/ol-data-platform/blob/main/src/ol_dbt/models/reporting/mitxonline_video_engagements_w_video_counts.sql#L28-L38
- Create reporting model that extends marts__mitxonline_course_engagements_daily - Add platform column (MITx Online) - Add courserun_is_current boolean (checks if course is currently active) - Add course_readable_id from int__mitx__courses - Add full documentation and tests to _reporting__models.yml - Replaces Superset virtual dataset with physical table
…itxonline_course_engagements_daily_report (#1918) * Initial plan * Fix course join to use mitxonline_course_id instead of course_number to avoid duplicates Co-authored-by: quazi-h <59845076+quazi-h@users.noreply.github.com> --------- Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: quazi-h <59845076+quazi-h@users.noreply.github.com>
…engagements_daily_report (#1924) * Initial plan * Fix courserun_is_current and select * in mitxonline_course_engagements_daily_report Co-authored-by: quazi-h <59845076+quazi-h@users.noreply.github.com> --------- Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: quazi-h <59845076+quazi-h@users.noreply.github.com>
…8601 timestamps, selective CTE columns (#1928) * Initial plan * Replace select * with explicit columns in mitxonline_course_engagements_daily_report Co-authored-by: quazi-h <59845076+quazi-h@users.noreply.github.com> --------- Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: quazi-h <59845076+quazi-h@users.noreply.github.com>
- Replace inline case statement with is_courserun_current macro - Improves code maintainability and reusability - Provides cross-database compatibility (Trino, DuckDB, StarRocks) - Addresses PR feedback
6e83c79 to
c9e46a6
Compare
Resolved conflict in _reporting__models.yml by keeping both model definitions: - mitxonline_course_engagements_daily_report (from this branch) - combined_video_engagements_counts_report (from main)
What are the relevant tickets?
https://github.com/mitodl/hq/issues/7644
Description (What does it do?)
This PR converts the Superset virtual dataset
marts__mitxonline_course_engagements_dailyinto a physical dbt reporting model to improve query performance in Superset.The new
mitxonline_course_engagements_daily_reportmodel extends the existingmarts__mitxonline_course_engagements_dailymart by adding three columns that were previously calculated in the virtual dataset:platform- Set to 'MITx Online' for consistent cross-platform filteringcourserun_is_current- Boolean indicating if the course run is currently active (checks if current date falls between courserun start and end dates)course_readable_id- Course identifier joined fromint__mitx__coursesfor additional course metadataHow can this be tested?
Build and test the model:
dbt build --select mitxonline_course_engagements_daily_report --vars 'schema_suffix: <your_username>' --target dev_productionOnce this PR is merged and deployed, the corresponding Superset virtual dataset can be replaced with this physical table.