diff --git a/roads_management_insights/rmi_sample_queries/README.md b/roads_management_insights/rmi_sample_queries/README.md
new file mode 100644
index 0000000..b3b800f
--- /dev/null
+++ b/roads_management_insights/rmi_sample_queries/README.md
@@ -0,0 +1,115 @@
+# Road Management Insights (RMI) - Sample Query Library
+
+This folder contains a curated library of SQL queries and interactive notebooks designed for the Road Management Insights (RMI) BigQuery dataset. These assets are organized by analytical persona to help users quickly find the right starting point for their specific use case.
+
+## 📂 Folder Structure
+
+- **`/queries`**: A collection of verified SQL queries categorized by persona. These queries demonstrate best practices for querying the RMI data model, including travel time analysis, bottleneck detection, and spatial joins.
+- **`/notebooks`**: Interactive Jupyter notebooks ready to be opened in Google Colab or BigQuery Studio. These notebooks provide a guided environment to execute the sample queries against the RMI sample dataset.
+
+## 🚀 Getting Started
+
+1. **Browse Queries**: Explore the roles below to find working SQL samples for your specific business questions.
+2. **Interactive Analysis**: Open the `.ipynb` files in the `/notebooks` directory for a guided, hands-on experience.
+3. **Dataset**: All queries are designed to run against the `boston_oct_2025_sample_data` shared dataset.
+
+---
+
+## 👥 Query Catalog by Persona
+
+### Traffic Operations Manager
+*Real-time monitoring and immediate bottleneck detection.*
+
+1. **Peak Hour Delay Analysis**: What is the average travel time delay during the morning peak (7-9 AM) for the top 10 most congested routes?
+ * [View SQL](queries/traffic_operations_manager/tom1_peak_hour_delay.sql)
+2. **Persistent Bottlenecks**: Which road segments (SRIs) have been in a 'TRAFFIC_JAM' state most frequently?
+ * [View SQL](queries/traffic_operations_manager/tom2_persistent_bottlenecks.sql)
+3. **Operational Health Check**: Which routes are currently flagged with a 'LOW_ROAD_USAGE' validation error?
+ * [View SQL](queries/traffic_operations_manager/tom3_operational_health.sql)
+4. **Data Collection Latency**: Are there any active routes that have stopped sending data near the end of the snapshot period?
+ * [View SQL](queries/traffic_operations_manager/tom4_data_latency.sql)
+5. **Significant Event Detection**: Which routes experienced a travel time more than double their static baseline?
+ * [View SQL](queries/traffic_operations_manager/tom5_significant_event_detection.sql)
+
+### Urban Planner
+*Long-term trends and infrastructure planning.*
+
+1. **Long-Term Corridor Performance**: What has been the week-over-week trend in the average delay ratio for a specific corridor?
+ * [View SQL](queries/urban_planner/up1_corridor_trend.sql)
+2. **Traffic Monitoring Density**: Which geographic areas show the highest concentration of RMI route monitoring?
+ * [View SQL](queries/urban_planner/up3_monitoring_density.sql)
+3. **Weekend vs. Weekday Trends**: How does average travel time in the afternoon (2-5 PM) differ between weekdays and weekends?
+ * [View SQL](queries/urban_planner/up4_weekend_vs_weekday.sql)
+4. **Before-and-After Impact Analysis**: Has the average travel time on routes passing through a recent construction zone improved?
+ * [View SQL](queries/urban_planner/up2_impact_analysis.sql)
+5. **Geofenced Congestion**: Within a specific downtown polygon, which routes are currently seeing travel times more than 50% above baseline?
+ * [View SQL](queries/urban_planner/up5_geofenced_congestion.sql)
+
+### Data Scientist
+*Statistical analysis and predictive modeling.*
+
+1. **Outlier Detection (IQR)**: Identify travel time records that are statistical outliers for a specific route.
+ * [View SQL](queries/data_scientist/ds1_outlier_detection.sql)
+2. **Route Integrity Audit**: Which routes have a captured geometry that deviates significantly from the intended length?
+ * [View SQL](queries/data_scientist/ds4_route_integrity_audit.sql)
+3. **Persistent Unreliability Audit**: Group consecutive travel time spikes into failure windows (streaks).
+ * [View SQL](queries/data_scientist/ds5_reliability_ranking.sql)
+4. **Route Similarity Clustering**: Group routes based on their diurnal morning, midday, and evening delay profiles.
+ * [View SQL](queries/data_scientist/ds2_similarity_clustering.sql)
+5. **Predictive Feature Engineering**: Generate a high-quality, gap-aware feature set with regularized hourly grids.
+ * [View SQL](queries/data_scientist/ds3_feature_engineering.sql)
+6. **Travel Time Forecasting (ARIMA)**: Train and backtest a predictive model for future travel times.
+ * [View SQL](queries/data_scientist/ds6_travel_time_forecasting.sql)
+7. **Zero-Shot Forecasting (TimesFM)**: Forecast next-day traffic for multiple routes simultaneously without training.
+ * [View SQL](queries/data_scientist/ds7_zero_shot_forecasting.sql)
+
+### RMI Planner
+*Business value and monitoring scale.*
+
+1. **Usage Growth Projection**: Forecast data volume and compute spend for larger route fleets.
+ * [View SQL](queries/rmi_planner/rmip1_usage_projection.sql)
+2. **Customer ROI (Value at Risk)**: Quantify the total hours of delay across critical corridors.
+ * [View SQL](queries/rmi_planner/rmip2_customer_roi.sql)
+3. **Road Segment Estimation**: Estimate physical scale of the addressable monitoring network.
+ * [View SQL](queries/rmi_planner/rmip3_segment_estimation.sql)
+4. **Administrative Geofencing**: Create reusable city boundaries for localized reporting.
+ * [View SQL](queries/rmi_planner/rmip4_area_boundary.sql)
+
+### Data Engineer
+*Data pipelines and analysis-ready datasets.*
+
+1. **Create Materialized Subset**: Create filtered, high-performance materialized views.
+ * [View SQL](queries/data_engineer/de1_materialized_view.sql)
+2. **Data Cleaning**: Produce a typed, cleaned version of route metadata.
+ * [View SQL](queries/data_engineer/de2_data_cleaning.sql)
+3. **SRI Flattening**: Transform nested arrays into flattened spatial records with distance metrics.
+ * [View SQL](queries/data_engineer/de3_sri_flattening.sql)
+4. **Attribute Extraction**: Pivot JSON attributes into distinct, typed columns.
+ * [View SQL](queries/data_engineer/de4_attribute_extraction.sql)
+5. **Data Freshness Audit**: Monitor latest arrival times across active routes.
+ * [View SQL](queries/data_engineer/de5_freshness_audit.sql)
+6. **Automated Status History**: Capture daily snapshots of route status using scheduled queries.
+ * [View SQL](queries/data_engineer/de7_routes_status_snapshot.sql)
+
+### BigQuery Admin
+*Platform health, cost governance, and performance optimization.*
+
+1. **Metadata Inventory**: Zero-cost check of row count and storage size for RMI tables.
+ * [View SQL](queries/bigquery_admin/bqa0_metadata_inventory.sql)
+2. **Scan Volume Monitoring**: Identify users or service accounts generating high scan volume.
+ * [View SQL](queries/bigquery_admin/bqa1_scan_volume.sql)
+3. **Cost Attribution Audit**: Identify jobs missing the mandatory naming prefix in their job IDs.
+ * [View SQL](queries/bigquery_admin/bqa2_cost_attribution.sql)
+4. **Identify Derived Resources**: Audit tables or views derived from the core RMI dataset.
+ * [View SQL](queries/bigquery_admin/bqa3_derived_resources.sql)
+5. **Repeated Query Patterns**: Detect frequent patterns (joins, JSON extraction) for pro-active optimization.
+ * [View SQL](queries/bigquery_admin/bqa4_query_patterns.sql)
+6. **Partition Pruning Audit**: Identify queries performing expensive full table scans instead of using partition filters.
+ * [View SQL](queries/bigquery_admin/bqa5_partition_pruning.sql)
+7. **Data Complexity Audit**: Audit spatial complexity (vertex count) and metadata size of actual records.
+ * [View SQL](queries/bigquery_admin/bqa6_data_complexity_audit.sql)
+
+
+## 📄 License
+
+Copyright 2026 Google LLC. Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at [https://www.apache.org/licenses/LICENSE-2.0](https://www.apache.org/licenses/LICENSE-2.0).
diff --git a/roads_management_insights/rmi_sample_queries/notebooks/BigQuery_Admin_Samples.ipynb b/roads_management_insights/rmi_sample_queries/notebooks/BigQuery_Admin_Samples.ipynb
new file mode 100644
index 0000000..b1fd0bc
--- /dev/null
+++ b/roads_management_insights/rmi_sample_queries/notebooks/BigQuery_Admin_Samples.ipynb
@@ -0,0 +1,480 @@
+{
+ "cells": [
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "```\n# Copyright 2026 Google LLC\n#\n# Licensed under the Apache License, Version 2.0 (the \"License\");\n# you may not use this file except in compliance with the License.\n# You may obtain a copy of the License at\n#\n# https://www.apache.org/licenses/LICENSE-2.0\n#\n# Unless required by applicable law or agreed to in writing, software\n# distributed under the License is distributed on an \"AS IS\" BASIS,\n# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n# See the License for the specific language governing permissions and\n# limitations under the License.\n```"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "# RMI Sample Queries: BigQuery Admin (GA)\n",
+ "\n",
+ "
\n",
+ " \n",
+ " \n",
+ "  Open in Colab\n",
+ " \n",
+ " | \n",
+ " \n",
+ " \n",
+ "  Open in Colab Enterprise\n",
+ " \n",
+ " | \n",
+ " \n",
+ " \n",
+ "  Open in BigQuery Studio\n",
+ " \n",
+ " | \n",
+ " \n",
+ " \n",
+ "  View on GitHub\n",
+ " \n",
+ " | \n",
+ "
\n",
+ "\n"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "This notebook contains sample queries for the **BigQuery Admin** persona, specifically for the **GA** stage."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "## 1. Setup"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "from google.colab import auth\n",
+ "import pandas as pd\n",
+ "\n",
+ "auth.authenticate_user()\n",
+ "\n",
+ "project_id = 'your-project-id' #@param {type:\"string\"}"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "### Writable Dataset\n",
+ "\n",
+ "Several queries in this notebook (e.g., those creating Materialized Views, Models, or Views) require a **writable dataset** within your own project. \n",
+ "**Note**: The source `boston_oct_2025_sample_data` dataset is a read-only subscription and cannot be used to store new resources.\n",
+ "\n",
+ "Run the cell below to create a new dataset (e.g., `rmi_analysis`) in your project if you haven't already.\n",
+ "\n",
+ "**Important**: When running queries that create new BigQuery resources (e.g., tables, views, models) outside of these `%%bigquery` magic cells, remember to manually prepend the job ID with `msqlfactory--` for proper tracking. For example: `bq query --job_id=msqlfactory--your-descriptive-job-name ...`"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "dataset_id = \"rmi_analysis\" #@param {type:\"string\"}\n",
+ "! bq --location=US mk --dataset {project_id}:{dataset_id}"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "## GA (General Availability)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "### bqa0_metadata_inventory.sql\n",
+ "**Business Question**: How can I quickly check the row count and storage size of all RMI tables using zero-cost metadata queries?\nProduct Stage: GA\nEstimated Bytes Processed: N/A (Metadata Query)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "%%bigquery --project {project_id} df_bqa0_metadata_inventory\n",
+ "/*\n",
+ " This query utilizes INFORMATION_SCHEMA.PARTITIONS to provide a high-level \n",
+ " overview of table scale and data accumulation trends. \n",
+ " It processes 0 bytes because it scans system metadata rather than table data.\n",
+ "*/\n",
+ "\n",
+ "SELECT\n",
+ " table_name,\n",
+ " CASE \n",
+ " WHEN partition_id IS NULL OR partition_id = '__UNPARTITIONED__' THEN 'UNPARTITIONED' \n",
+ " ELSE partition_id \n",
+ " END as partition_id,\n",
+ " total_rows,\n",
+ " ROUND(total_logical_bytes / POW(1024, 2), 2) as size_mb,\n",
+ " last_modified_time\n",
+ "FROM `boston_oct_2025_sample_data.INFORMATION_SCHEMA.PARTITIONS`\n",
+ "WHERE table_name IN ('historical_travel_time', 'recent_roads_data', 'routes_status')\n",
+ " AND partition_id != '__NULL__'\n",
+ "ORDER BY table_name, partition_id DESC;\n",
+ ""
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "### bqa1_scan_volume.sql\n",
+ "**Business Question**: Which users or service accounts are generating the highest scan volume against RMI tables this month?\nUse Case: Enables cost governance by identifying 'heavy' consumers of the RMI dataset. Administrators can use this data to justify budget reallocations or suggest query optimizations to specific teams.\nProduct Stage: GA (Uses BigQuery INFORMATION_SCHEMA)\nEstimated Bytes Processed: N/A (Metadata Query)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "%%bigquery --project {project_id} df_bqa1_scan_volume\n",
+ "/*\n",
+ " AUDIT PATTERN: Job Metadata Analysis\n",
+ " This query scans the system-managed JOBS view. It calculates total data scanned \n",
+ " (billed bytes) for any query that mentions core RMI table names.\n",
+ " \n",
+ " Note: Replace 'region-us' with the specific region where your dataset resides.\n",
+ "*/\n",
+ "\n",
+ "SELECT\n",
+ " user_email,\n",
+ " -- Convert bytes to GB for readable billing analysis\n",
+ " SUM(total_bytes_billed) / POW(1024, 3) AS total_gb_billed,\n",
+ " COUNT(*) AS job_count,\n",
+ " -- Average scan size helps distinguish between 'many small queries' vs 'one massive scan'\n",
+ " AVG(total_bytes_billed) / POW(1024, 3) AS avg_gb_per_job\n",
+ "FROM `region-us`.INFORMATION_SCHEMA.JOBS\n",
+ "WHERE creation_time BETWEEN TIMESTAMP_TRUNC(CURRENT_TIMESTAMP(), MONTH) AND CURRENT_TIMESTAMP()\n",
+ " AND job_type = 'QUERY'\n",
+ " -- Heuristic filter: Look for queries mentioning RMI core tables\n",
+ " AND (\n",
+ " query LIKE '%historical_travel_time%' \n",
+ " OR query LIKE '%recent_roads_data%' \n",
+ " OR query LIKE '%routes_status%'\n",
+ " )\n",
+ "GROUP BY 1\n",
+ "ORDER BY total_gb_billed DESC\n",
+ "LIMIT 10;\n",
+ ""
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "### bqa2_cost_attribution.sql\n",
+ "**Business Question**: Identify any BigQuery jobs missing the mandatory 'rmisqlfactory_' prefix in their job IDs.\nUse Case: Ensures compliance with project governance standards. Consistent job ID prefixing is required for accurate cost attribution and auditing of RMI-related analysis.\nProduct Stage: GA (Uses BigQuery INFORMATION_SCHEMA)\nEstimated Bytes Processed: N/A (Metadata Query)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "%%bigquery --project {project_id} df_bqa2_cost_attribution\n",
+ "/*\n",
+ " NOTE: 'rmisqlfactory_' is the mandatory job ID prefix for this workspace.\n",
+ " This allows administrators to filter billing logs and correlate spend \n",
+ " with specific personas or tools.\n",
+ " \n",
+ " SCOPE NOTE: Replace 'JOBS' with 'JOBS_BY_ORGANIZATION' if you have the \n",
+ " necessary permissions to audit spend across multiple projects.\n",
+ "*/\n",
+ "\n",
+ "SELECT\n",
+ " job_id,\n",
+ " user_email,\n",
+ " creation_time,\n",
+ " total_bytes_billed,\n",
+ " -- Provide the query text to help identify the source of the non-compliant job\n",
+ " query\n",
+ "FROM `region-us`.INFORMATION_SCHEMA.JOBS\n",
+ "WHERE creation_time >= TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 7 DAY)\n",
+ " AND job_type = 'QUERY'\n",
+ " -- Filter for jobs that missed the mandatory prefix\n",
+ " AND NOT STARTS_WITH(job_id, 'rmisqlfactory_')\n",
+ " -- Only audit jobs that were targeting RMI core tables\n",
+ " AND (\n",
+ " query LIKE '%historical_travel_time%' \n",
+ " OR query LIKE '%recent_roads_data%' \n",
+ " OR query LIKE '%routes_status%'\n",
+ " )\n",
+ "ORDER BY creation_time DESC;\n",
+ ""
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "### bqa3_derived_resources.sql\n",
+ "**Business Question**: What tables or views in my project are derived from the core RMI dataset?\nUse Case: Critical for lineage auditing and change management. Identifies 'shadow' analytical assets that may need to be updated or retired if the core RMI schema changes.\nProduct Stage: GA (Uses BigQuery INFORMATION_SCHEMA)\nEstimated Bytes Processed: N/A (Metadata Query)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "%%bigquery --project {project_id} df_bqa3_derived_resources\n",
+ "/*\n",
+ " LINEAGE PATTERN: Metadata Dependency Mapping\n",
+ " This query scans the project metadata to find any VIEW definition that \n",
+ " references RMI core tables, as well as any clones or snapshots \n",
+ " targeting 'rmi' or 'road' named resources.\n",
+ "*/\n",
+ "\n",
+ "-- Replace `your-project.your-dataset` with the location of your analytical workspace.\n",
+ "\n",
+ "SELECT\n",
+ " table_schema AS dataset_id,\n",
+ " table_name AS resource_name,\n",
+ " 'VIEW' AS type,\n",
+ " -- view_definition allows the admin to see the exact transformation logic\n",
+ " view_definition as lineage_detail\n",
+ "FROM `your-project.your-dataset.INFORMATION_SCHEMA.VIEWS`\n",
+ "WHERE (\n",
+ " view_definition LIKE '%historical_travel_time%' \n",
+ " OR view_definition LIKE '%recent_roads_data%' \n",
+ " OR view_definition LIKE '%routes_status%'\n",
+ " )\n",
+ "\n",
+ "UNION ALL\n",
+ "\n",
+ "-- Also identify Clones and Snapshots (cost-effective analytical patterns)\n",
+ "SELECT\n",
+ " table_schema AS dataset_id,\n",
+ " table_name AS resource_name,\n",
+ " table_type AS type,\n",
+ " 'N/A (Check Table Metadata for Base Table Lineage)' AS lineage_detail\n",
+ "FROM `your-project.your-dataset.INFORMATION_SCHEMA.TABLES`\n",
+ "WHERE (table_name LIKE '%rmi%' OR table_name LIKE '%road%')\n",
+ " AND table_type IN ('BASE TABLE', 'CLONE', 'SNAPSHOT')\n",
+ " -- Exclude the raw source tables themselves\n",
+ " AND table_name NOT IN ('historical_travel_time', 'recent_roads_data', 'routes_status')\n",
+ "\n",
+ "ORDER BY dataset_id, resource_name;\n",
+ ""
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "### bqa4_query_patterns.sql\n",
+ "**Business Question**: What are the most frequent query patterns (joins, filters, JSON extractions) that could benefit from optimized downstream tables?\nUse Case: Enables pro-active optimization. If many users are extracting the same JSON attribute or joining the same tables daily, the Admin can create a materialized view or flattened table to improve performance and reduce costs.\nProduct Stage: GA (Uses BigQuery INFORMATION_SCHEMA)\nEstimated Bytes Processed: N/A (Metadata Query)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "%%bigquery --project {project_id} df_bqa4_query_patterns\n",
+ "/*\n",
+ " OPTIMIZATION PATTERN: Pattern Mining\n",
+ " This query analyzes your recent job history to identify common access patterns.\n",
+ " \n",
+ " Note: Replace 'region-us' with your actual BigQuery region.\n",
+ "*/\n",
+ "\n",
+ "WITH job_history AS (\n",
+ " SELECT\n",
+ " query,\n",
+ " total_bytes_processed\n",
+ " FROM `region-us`.INFORMATION_SCHEMA.JOBS\n",
+ " WHERE creation_time >= TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 30 DAY)\n",
+ " AND job_type = 'QUERY'\n",
+ " AND statement_type = 'SELECT'\n",
+ " -- Heuristic: Focus on RMI-related queries\n",
+ " AND (\n",
+ " query LIKE '%historical_travel_time%' \n",
+ " OR query LIKE '%recent_roads_data%' \n",
+ " OR query LIKE '%routes_status%'\n",
+ " )\n",
+ "),\n",
+ "patterns AS (\n",
+ " SELECT\n",
+ " query,\n",
+ " -- Regex: Identify specific JSON attributes being extracted from 'route_attributes'\n",
+ " REGEXP_EXTRACT_ALL(query, r\"JSON_EXTRACT_SCALAR\\(route_attributes, '([^']+)'\\)\") as extracted_attributes,\n",
+ " -- Regex: Detect if the query performs SRI flattening (expensive unnest)\n",
+ " REGEXP_CONTAINS(query, r\"UNNEST\\(speed_reading_intervals\\)\") as uses_sri_unnest,\n",
+ " -- Regex: Detect common join patterns\n",
+ " REGEXP_CONTAINS(query, r\"JOIN\\s+`[^`]+historical_travel_time`\") AND REGEXP_CONTAINS(query, r\"JOIN\\s+`[^`]+routes_status`\") as joins_hist_and_status\n",
+ " FROM job_history\n",
+ ")\n",
+ "SELECT\n",
+ " 'Frequent Attribute Extraction' as pattern_type,\n",
+ " attr as detail,\n",
+ " COUNT(*) as frequency\n",
+ "FROM patterns, UNNEST(extracted_attributes) as attr\n",
+ "GROUP BY 1, 2\n",
+ "\n",
+ "UNION ALL\n",
+ "\n",
+ "SELECT\n",
+ " 'Heavy SRI Processing' as pattern_type,\n",
+ " 'Uses UNNEST(speed_reading_intervals)' as detail,\n",
+ " COUNT(*) as frequency\n",
+ "FROM patterns\n",
+ "WHERE uses_sri_unnest\n",
+ "GROUP BY 1, 2\n",
+ "\n",
+ "UNION ALL\n",
+ "\n",
+ "SELECT\n",
+ " 'Common Table Joins' as pattern_type,\n",
+ " 'Joins historical_travel_time and routes_status' as detail,\n",
+ " COUNT(*) as frequency\n",
+ "FROM patterns\n",
+ "WHERE joins_hist_and_status\n",
+ "GROUP BY 1, 2\n",
+ "\n",
+ "ORDER BY frequency DESC;\n",
+ ""
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "### bqa5_partition_pruning.sql\n",
+ "**Business Question**: Are there queries performing full table scans on 'historical_travel_time' instead of using the 'record_time' partition filter?\nUse Case: Detects 'expensive' behavior. Since RMI datasets are partitioned by day on 'record_time', any query that doesn't include a temporal filter will scan the entire history, significantly increasing costs.\nProduct Stage: GA (Uses BigQuery INFORMATION_SCHEMA)\nEstimated Bytes Processed: N/A (Metadata Query)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "%%bigquery --project {project_id} df_bqa5_partition_pruning\n",
+ "/*\n",
+ " AUDIT PATTERN: Pruning Heuristics\n",
+ " This query identifies large scans on the historical table. \n",
+ " It calculates if the 'total_bytes_processed' for a job is disproportionately \n",
+ " large compared to the typical size of a single daily partition.\n",
+ " \n",
+ " Note: Replace 'region-us' with your actual BigQuery region.\n",
+ "*/\n",
+ "\n",
+ "SELECT\n",
+ " job_id,\n",
+ " user_email,\n",
+ " query,\n",
+ " -- Convert bytes to GB for readable performance auditing\n",
+ " total_bytes_processed / POW(1024, 3) AS gb_processed,\n",
+ " creation_time\n",
+ "FROM `region-us`.INFORMATION_SCHEMA.JOBS\n",
+ "WHERE creation_time >= TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 7 DAY)\n",
+ " AND job_type = 'QUERY'\n",
+ " AND statement_type = 'SELECT'\n",
+ " AND query LIKE '%historical_travel_time%'\n",
+ " -- Heuristic: Trigger audit if scan volume exceeds 100 GB (adjustable baseline)\n",
+ " -- This suggests the user might have missed a partition pruning filter (record_time)\n",
+ " AND total_bytes_processed > 100 * POW(1024, 3) \n",
+ "ORDER BY gb_processed DESC\n",
+ "LIMIT 10;\n",
+ ""
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "### bqa6_data_complexity_audit.sql\n",
+ "**Business Question**: What is the average spatial complexity (vertex count) and metadata size (routeAttributes) of my actual records?\nProduct Stage: GA\nEstimated Bytes Processed: ~450 MB (Full scan of geometry and attributes)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "%%bigquery --project {project_id} df_bqa6_data_complexity_audit\n",
+ "/*\n",
+ " This query performs a deep audit of the data payload. \n",
+ " It is useful for understanding the impact of route precision and \n",
+ " custom attributes on storage and processing costs.\n",
+ "*/\n",
+ "\n",
+ "-- Historical Spatial Complexity\n",
+ "SELECT\n",
+ " 'historical_travel_time' as table_name,\n",
+ " COUNT(DISTINCT selected_route_id) as unique_routes,\n",
+ " AVG(BYTE_LENGTH(ST_ASBINARY(route_geometry))) as avg_geom_bytes,\n",
+ " AVG(ST_LENGTH(route_geometry) / 1000) as avg_route_length_km,\n",
+ " AVG(ST_NUMPOINTS(route_geometry)) as avg_num_points,\n",
+ " CAST(NULL AS FLOAT64) as avg_attr_bytes\n",
+ "FROM `boston_oct_2025_sample_data.historical_travel_time`\n",
+ "WHERE record_time BETWEEN '2025-10-01' AND '2025-11-01'\n",
+ "\n",
+ "UNION ALL\n",
+ "\n",
+ "-- Recent Spatial Complexity (Enriched)\n",
+ "SELECT\n",
+ " 'recent_roads_data' as table_name,\n",
+ " COUNT(DISTINCT selected_route_id) as unique_routes,\n",
+ " AVG(BYTE_LENGTH(ST_ASBINARY(route_geometry))) as avg_geom_bytes,\n",
+ " AVG(ST_LENGTH(route_geometry) / 1000) as avg_route_length_km,\n",
+ " AVG(ST_NUMPOINTS(route_geometry)) as avg_num_points,\n",
+ " CAST(NULL AS FLOAT64) as avg_attr_bytes\n",
+ "FROM `boston_oct_2025_sample_data.recent_roads_data`\n",
+ "WHERE record_time BETWEEN '2025-10-01' AND '2025-11-01'\n",
+ "\n",
+ "UNION ALL\n",
+ "\n",
+ "-- Status Metadata Complexity\n",
+ "SELECT\n",
+ " 'routes_status' as table_name,\n",
+ " COUNT(DISTINCT selected_route_id) as unique_routes,\n",
+ " CAST(NULL AS FLOAT64) as avg_geom_bytes,\n",
+ " CAST(NULL AS FLOAT64) as avg_route_length_km,\n",
+ " CAST(NULL AS FLOAT64) as avg_num_points,\n",
+ " AVG(BYTE_LENGTH(route_attributes)) as avg_attr_bytes\n",
+ "FROM `boston_oct_2025_sample_data.routes_status`;\n",
+ ""
+ ]
+ }
+ ],
+ "metadata": {
+ "colab": {
+ "provenance": []
+ },
+ "kernelspec": {
+ "display_name": "Python 3",
+ "name": "python3"
+ },
+ "language_info": {
+ "name": "python"
+ }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 0
+}
\ No newline at end of file
diff --git a/roads_management_insights/rmi_sample_queries/notebooks/Data_Engineer_Samples.ipynb b/roads_management_insights/rmi_sample_queries/notebooks/Data_Engineer_Samples.ipynb
new file mode 100644
index 0000000..fe73ca0
--- /dev/null
+++ b/roads_management_insights/rmi_sample_queries/notebooks/Data_Engineer_Samples.ipynb
@@ -0,0 +1,491 @@
+{
+ "cells": [
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "```\n# Copyright 2026 Google LLC\n#\n# Licensed under the Apache License, Version 2.0 (the \"License\");\n# you may not use this file except in compliance with the License.\n# You may obtain a copy of the License at\n#\n# https://www.apache.org/licenses/LICENSE-2.0\n#\n# Unless required by applicable law or agreed to in writing, software\n# distributed under the License is distributed on an \"AS IS\" BASIS,\n# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n# See the License for the specific language governing permissions and\n# limitations under the License.\n```"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "# RMI Sample Queries: Data Engineer (GA)\n",
+ "\n",
+ "\n",
+ " \n",
+ " \n",
+ "  Open in Colab\n",
+ " \n",
+ " | \n",
+ " \n",
+ " \n",
+ "  Open in Colab Enterprise\n",
+ " \n",
+ " | \n",
+ " \n",
+ " \n",
+ "  Open in BigQuery Studio\n",
+ " \n",
+ " | \n",
+ " \n",
+ " \n",
+ "  View on GitHub\n",
+ " \n",
+ " | \n",
+ "
\n",
+ "\n"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "This notebook contains sample queries for the **Data Engineer** persona, specifically for the **GA** stage."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "## 1. Setup"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "from google.colab import auth\n",
+ "import pandas as pd\n",
+ "\n",
+ "auth.authenticate_user()\n",
+ "\n",
+ "project_id = 'your-project-id' #@param {type:\"string\"}"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "### Writable Dataset\n",
+ "\n",
+ "Several queries in this notebook (e.g., those creating Materialized Views, Models, or Views) require a **writable dataset** within your own project. \n",
+ "**Note**: The source `boston_oct_2025_sample_data` dataset is a read-only subscription and cannot be used to store new resources.\n",
+ "\n",
+ "Run the cell below to create a new dataset (e.g., `rmi_analysis`) in your project if you haven't already.\n",
+ "\n",
+ "**Important**: When running queries that create new BigQuery resources (e.g., tables, views, models) outside of these `%%bigquery` magic cells, remember to manually prepend the job ID with `msqlfactory--` for proper tracking. For example: `bq query --job_id=msqlfactory--your-descriptive-job-name ...`"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "dataset_id = \"rmi_analysis\" #@param {type:\"string\"}\n",
+ "! bq --location=US mk --dataset {project_id}:{dataset_id}"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "## GA (General Availability)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "### de1_materialized_view.sql\n",
+ "**Business Question**: Generate a query to create a 7-day materialized view of historical_travel_time for a specific corridor.\nProduct Stage: GA\nEstimated Bytes Processed: ~150 MB\nMetadata: Uses ALTER statements to apply technical descriptions to all columns and the view itself."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "%%bigquery --project {project_id} df_de1_materialized_view\n",
+ "-- NOTE: The source dataset (e.g., `boston_oct_2025_sample_data`) is a read-only subscription from Analytics Hub.\n",
+ "-- This materialized view MUST be created in a separate, writable dataset within your project.\n",
+ "-- Replace `your-project.your-dataset` with your target location.\n",
+ "\n",
+ "CREATE MATERIALIZED VIEW IF NOT EXISTS `your-project.your-dataset.storrow_drive_view`\n",
+ "CLUSTER BY selected_route_id AS\n",
+ "SELECT\n",
+ " selected_route_id,\n",
+ " display_name,\n",
+ " record_time,\n",
+ " duration_in_seconds,\n",
+ " static_duration_in_seconds,\n",
+ " route_geometry\n",
+ "FROM `boston_oct_2025_sample_data.historical_travel_time`\n",
+ "WHERE record_time >= TIMESTAMP_SUB(TIMESTAMP('2025-10-31'), INTERVAL 7 DAY)\n",
+ " AND display_name LIKE '%Storrow-Drive%';\n",
+ "\n",
+ "-- Applying view-level metadata\n",
+ "ALTER MATERIALIZED VIEW `your-project.your-dataset.storrow_drive_view`\n",
+ "SET OPTIONS (\n",
+ " description=\"A 7-day rolling subset of RMI historical travel time data specifically for the Storrow Drive corridor.\"\n",
+ ");\n",
+ "\n",
+ "-- Applying column-level metadata descriptions\n",
+ "ALTER COLUMN selected_route_id SET OPTIONS(description=\"Unique identifier for the SelectedRoute resource.\")\n",
+ "ON `your-project.your-dataset.storrow_drive_view`;\n",
+ "\n",
+ "ALTER COLUMN display_name SET OPTIONS(description=\"User-provided descriptive name for the route.\")\n",
+ "ON `your-project.your-dataset.storrow_drive_view`;\n",
+ "\n",
+ "ALTER COLUMN record_time SET OPTIONS(description=\"The UTC timestamp representing when the route data was computed.\")\n",
+ "ON `your-project.your-dataset.storrow_drive_view`;\n",
+ "\n",
+ "ALTER COLUMN duration_in_seconds SET OPTIONS(description=\"The traffic-aware duration of the route in seconds.\")\n",
+ "ON `your-project.your-dataset.storrow_drive_view`;\n",
+ "\n",
+ "ALTER COLUMN static_duration_in_seconds SET OPTIONS(description=\"The traffic-unaware (static) duration of the route in seconds.\")\n",
+ "ON `your-project.your-dataset.storrow_drive_view`;\n",
+ "\n",
+ "ALTER COLUMN route_geometry SET OPTIONS(description=\"The traffic-aware polyline geometry of the route as a GEOGRAPHY object.\")\n",
+ "ON `your-project.your-dataset.storrow_drive_view`;\n",
+ ""
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "### de2_data_cleaning.sql\n",
+ "**Business Question**: Write a query that produces a \"cleaned\" version of the routes_status table, correctly casting the route_length.\nProduct Stage: GA\nEstimated Bytes Processed: < 1 MB\nMetadata: Provides descriptions for transformed fields and the view itself."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "%%bigquery --project {project_id} df_de2_data_cleaning\n",
+ "/*\n",
+ " PRE-REQUISITE: This query utilizes the custom routeAttribute 'route_length' \n",
+ " (intended physical length in meters), which has been pre-configured for \n",
+ " all routes in this sample dataset.\n",
+ "*/\n",
+ "\n",
+ "CREATE OR REPLACE VIEW `your-project.your-dataset.routes_status_cleaned`\n",
+ "(\n",
+ " selected_route_id OPTIONS(description=\"Unique identifier for the SelectedRoute resource.\"),\n",
+ " display_name OPTIONS(description=\"User-provided descriptive name for the route.\"),\n",
+ " status OPTIONS(description=\"Operational state (e.g., STATUS_RUNNING).\"),\n",
+ " validation_error OPTIONS(description=\"Reason for failure if status is INVALID.\"),\n",
+ " route_length_meters OPTIONS(description=\"The pre-computed intended route length in meters, cast from the custom 'route_length' routeAttribute.\")\n",
+ ")\n",
+ "OPTIONS(\n",
+ " description=\"A cleaned view of SelectedRoutes status, with the custom route_length attribute promoted to a typed column.\"\n",
+ ")\n",
+ "AS\n",
+ "SELECT\n",
+ " selected_route_id,\n",
+ " display_name,\n",
+ " status,\n",
+ " validation_error,\n",
+ " CAST(JSON_VALUE(route_attributes, '$.route_length') AS FLOAT64) AS route_length_meters\n",
+ "FROM `boston_oct_2025_sample_data.routes_status`\n",
+ "WHERE status != 'STATUS_INVALID';\n",
+ ""
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "### de3_sri_flattening.sql\n",
+ "**Business Question**: Create an optimized script to transform the latest 30 minutes of nested SRI data into a flattened format with spatial progress metrics and quality filters.\nProduct Stage: GA\nEstimated Bytes Processed: ~10 MB (Optimized via Scripting and Static Partition Pruning)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "%%bigquery --project {project_id} df_de3_sri_flattening\n",
+ "/*\n",
+ " BIGQUERY OPTIMIZATION PATTERN: Static vs. Dynamic Partition Pruning\n",
+ " \n",
+ " This query uses BigQuery Scripting (DECLARE/SET) to force \"Static Pruning\".\n",
+ " \n",
+ " 1. Static Pruning (This Pattern): By resolving 'target_time' into a variable BEFORE \n",
+ " the main SELECT, BigQuery treats it as a constant. This allows the optimizer \n",
+ " to immediately discard irrelevant partitions.\n",
+ " \n",
+ " 2. Geometry Integrity Check: To ensure high-quality analysis, this query:\n",
+ " a) Calculates 'length_deviation_ratio' against pre-computed attributes.\n",
+ " b) Excludes 'MultiLineString' geometries to ensure we only process single, \n",
+ " continuous paths (ST_LineString).\n",
+ " \n",
+ " 3. Noise Reduction: Final results exclude 'NORMAL' speed states and filter out \n",
+ " extremely short intervals (< 5 meters) that often represent GPS noise.\n",
+ "*/\n",
+ "\n",
+ "-- Step 1: Define the static anchor date to narrow down partitions\n",
+ "DECLARE anchor_date DATE DEFAULT '2025-10-31';\n",
+ "\n",
+ "-- Step 2: Find the exact latest timestamp and define the 30-minute window\n",
+ "DECLARE latest_timestamp TIMESTAMP;\n",
+ "SET latest_timestamp = (\n",
+ " SELECT MAX(record_time)\n",
+ " FROM `boston_oct_2025_sample_data.recent_roads_data`\n",
+ " WHERE record_time >= TIMESTAMP(anchor_date)\n",
+ ");\n",
+ "\n",
+ "-- Step 3: Execute the flattening logic for the latest 30-minute window\n",
+ "WITH base_intervals AS (\n",
+ " SELECT\n",
+ " r.selected_route_id,\n",
+ " r.record_time,\n",
+ " segment_offset as interval_index,\n",
+ " sri.speed as interval_speed_state,\n",
+ " -- Reconstruct the interval polyline from the array of interval points\n",
+ " ST_MAKELINE(sri.interval_coordinates) as interval_geometry,\n",
+ " -- Core metrics for integrity check\n",
+ " ST_LENGTH(r.route_geometry) as actual_route_length_meters,\n",
+ " CAST(JSON_VALUE(s.route_attributes, '$.route_length') AS FLOAT64) as intended_route_length_meters\n",
+ " FROM `boston_oct_2025_sample_data.recent_roads_data` r\n",
+ " JOIN `boston_oct_2025_sample_data.routes_status` s USING(selected_route_id),\n",
+ " UNNEST(speed_reading_intervals) AS sri WITH OFFSET AS segment_offset\n",
+ " WHERE r.record_time >= TIMESTAMP(anchor_date)\n",
+ " -- Capture only records from the last 30 minutes\n",
+ " AND r.record_time > TIMESTAMP_SUB(latest_timestamp, INTERVAL 30 MINUTE)\n",
+ " -- Quality filter: Only process single, continuous paths\n",
+ " AND ST_GEOMETRYTYPE(r.route_geometry) = 'ST_LineString'\n",
+ "),\n",
+ "quality_filtered_intervals AS (\n",
+ " SELECT\n",
+ " *,\n",
+ " -- Deviation between intended and actual geometry length\n",
+ " SAFE_DIVIDE(ABS(actual_route_length_meters - intended_route_length_meters), intended_route_length_meters) as length_deviation_ratio\n",
+ " FROM base_intervals\n",
+ " -- Filter for high-integrity geometries (e.g., < 5% deviation)\n",
+ " WHERE SAFE_DIVIDE(ABS(actual_route_length_meters - intended_route_length_meters), intended_route_length_meters) < 0.05\n",
+ "),\n",
+ "metrics_calculation AS (\n",
+ " SELECT\n",
+ " *,\n",
+ " ST_LENGTH(interval_geometry) as interval_length_meters,\n",
+ " -- Roll-up sum of interval lengths to find cumulative distance from origin\n",
+ " SUM(ST_LENGTH(interval_geometry)) OVER (\n",
+ " PARTITION BY selected_route_id, record_time \n",
+ " ORDER BY interval_index\n",
+ " ) as cumulative_length_meters,\n",
+ " -- Count total intervals in the route for context\n",
+ " COUNT(*) OVER (\n",
+ " PARTITION BY selected_route_id, record_time\n",
+ " ) as total_intervals\n",
+ " FROM quality_filtered_intervals\n",
+ "),\n",
+ "position_ratios AS (\n",
+ " SELECT\n",
+ " *,\n",
+ " -- The end of the previous interval is the start of the current interval\n",
+ " COALESCE(LAG(cumulative_length_meters) OVER (\n",
+ " PARTITION BY selected_route_id, record_time \n",
+ " ORDER BY interval_index\n",
+ " ), 0.0) as start_length_meters\n",
+ " FROM metrics_calculation\n",
+ ")\n",
+ "SELECT\n",
+ " selected_route_id,\n",
+ " record_time,\n",
+ " interval_index,\n",
+ " total_intervals,\n",
+ " interval_speed_state,\n",
+ " interval_length_meters,\n",
+ " -- Rounded relative positions (0.000 to 1.000) within the trip\n",
+ " ROUND(SAFE_DIVIDE(start_length_meters, actual_route_length_meters), 3) as start_position_ratio,\n",
+ " ROUND(SAFE_DIVIDE(cumulative_length_meters, actual_route_length_meters), 3) as end_position_ratio,\n",
+ " length_deviation_ratio,\n",
+ " interval_geometry\n",
+ "FROM position_ratios\n",
+ "-- Filter for congested intervals and exclude noise (short intervals)\n",
+ "WHERE interval_speed_state != 'NORMAL'\n",
+ " AND interval_length_meters >= 5\n",
+ "ORDER BY selected_route_id, record_time, interval_index;\n",
+ ""
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "### de4_attribute_extraction.sql\n",
+ "**Business Question**: Write a query that pivots the JSON route_attributes into distinct columns.\nProduct Stage: GA\nEstimated Bytes Processed: < 1 MB\nMetadata: Enriches pivoted columns with business definitions."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "%%bigquery --project {project_id} df_de4_attribute_extraction\n",
+ "CREATE OR REPLACE VIEW `your-project.your-dataset.routes_enriched_attributes`\n",
+ "(\n",
+ " selected_route_id OPTIONS(description=\"Unique identifier for the SelectedRoute resource.\"),\n",
+ " region OPTIONS(description=\"The geographical business region extracted from routeAttributes.\"),\n",
+ " tier OPTIONS(description=\"The service tier (e.g. priority, standard) extracted from routeAttributes.\"),\n",
+ " priority OPTIONS(description=\"The operational priority level assigned during registration.\"),\n",
+ " route_length_meters OPTIONS(description=\"The intended physical length of the route in meters, cast to FLOAT64 from routeAttributes.\")\n",
+ ")\n",
+ "OPTIONS(\n",
+ " description=\"A denormalized view of SelectedRoute metadata, promoting custom JSON attributes into typed top-level columns.\"\n",
+ ")\n",
+ "AS\n",
+ "SELECT\n",
+ " selected_route_id,\n",
+ " JSON_EXTRACT_SCALAR(route_attributes, '$.region') as region,\n",
+ " JSON_EXTRACT_SCALAR(route_attributes, '$.tier') as tier,\n",
+ " JSON_EXTRACT_SCALAR(route_attributes, '$.priority') as priority,\n",
+ " -- route_attributes values are always strings. Casting to FLOAT64 for numerical analysis.\n",
+ " CAST(JSON_EXTRACT_SCALAR(route_attributes, '$.route_length') AS FLOAT64) as route_length_meters\n",
+ "FROM `boston_oct_2025_sample_data.routes_status`\n",
+ "-- Example: Filtering by priority attribute\n",
+ "-- WHERE JSON_EXTRACT_SCALAR(route_attributes, '$.priority') = 'high';\n",
+ ""
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "### de5_freshness_audit.sql\n",
+ "**Business Question**: Which active routes have stopped receiving updates, indicating potential data gaps?\nProduct Stage: GA\nEstimated Bytes Processed: ~151 MB\nMetadata: Provides descriptions for the audit results."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "%%bigquery --project {project_id} df_de5_freshness_audit\n",
+ "/*\n",
+ " AUDIT GOAL: Identify routes that are 'STATUS_RUNNING' but have no recent \n",
+ " records in historical_travel_time. This helps detect routes with \n",
+ " insufficient traffic or pipeline latency issues.\n",
+ "*/\n",
+ "\n",
+ "CREATE OR REPLACE VIEW `your-project.your-dataset.route_freshness_audit`\n",
+ "(\n",
+ " selected_route_id OPTIONS(description=\"Unique identifier for the SelectedRoute resource.\"),\n",
+ " display_name OPTIONS(description=\"Human-readable name of the route.\"),\n",
+ " last_updated OPTIONS(description=\"The timestamp of the most recent record found in historical_travel_time.\"),\n",
+ " hours_since_last_update OPTIONS(description=\"The age of the data in hours relative to the audit timestamp.\")\n",
+ ")\n",
+ "OPTIONS(\n",
+ " description=\"Operational audit view to identify active routes with missing or stale travel time data.\"\n",
+ ")\n",
+ "AS\n",
+ "WITH freshness AS (\n",
+ " SELECT\n",
+ " selected_route_id,\n",
+ " MAX(record_time) as last_updated\n",
+ " FROM `boston_oct_2025_sample_data.historical_travel_time`\n",
+ " -- Scans the full sample month to find the latest record for every route\n",
+ " WHERE record_time BETWEEN '2025-10-01' AND '2025-11-01'\n",
+ " GROUP BY 1\n",
+ ")\n",
+ "SELECT\n",
+ " s.selected_route_id,\n",
+ " s.display_name,\n",
+ " f.last_updated,\n",
+ " -- Using '2025-11-01' as the reference 'Now' for this static sample dataset\n",
+ " TIMESTAMP_DIFF(TIMESTAMP('2025-11-01'), f.last_updated, HOUR) AS hours_since_last_update\n",
+ "FROM `boston_oct_2025_sample_data.routes_status` s\n",
+ "LEFT JOIN freshness f USING(selected_route_id)\n",
+ "-- Focus on routes that SHOULD be providing data\n",
+ "WHERE s.status = 'STATUS_RUNNING'\n",
+ "ORDER BY hours_since_last_update DESC;\n",
+ ""
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "### de7_routes_status_snapshot.sql\n",
+ "**Business Question**: How can I automate the historical tracking of my SelectedRoutes' status changes?\nProduct Stage: GA\nEstimated Bytes Processed: < 1 MB\nMetadata: Inherits column descriptions from routes_status and adds snapshot metadata."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "%%bigquery --project {project_id} df_de7_routes_status_snapshot\n",
+ "/*\n",
+ " AUTOMATION EXAMPLE: \n",
+ " To schedule this snapshot daily at 2 AM UTC using the bq CLI, run:\n",
+ " \n",
+ " bq mk \\\n",
+ " --transfer_config \\\n",
+ " --project_id=\"your-project-id\" \\\n",
+ " --data_source=scheduled_query \\\n",
+ " --display_name=\"Daily RMI Routes Status Snapshot\" \\\n",
+ " --target_dataset=\"your_dataset\" \\\n",
+ " --schedule=\"every 24 hours\" \\\n",
+ " --params='{\n",
+ " \"query\":\"INSERT INTO `your-project.your-dataset.routes_status_history` SELECT CURRENT_TIMESTAMP() as snapshot_time, * FROM `boston_oct_2025_sample_data.routes_status`\"\n",
+ " }'\n",
+ "*/\n",
+ "\n",
+ "-- STEP 1: Initialize the partitioned history table with enriched metadata\n",
+ "CREATE TABLE IF NOT EXISTS `your-project.your-dataset.routes_status_history` (\n",
+ " snapshot_time TIMESTAMP OPTIONS(description=\"The UTC timestamp when this snapshot was captured.\"),\n",
+ " selected_route_id STRING OPTIONS(description=\"Unique identifier for the SelectedRoute resource.\"),\n",
+ " display_name STRING OPTIONS(description=\"User-provided descriptive name for the route.\"),\n",
+ " status STRING OPTIONS(description=\"Current operational state (e.g., STATUS_RUNNING, STATUS_INVALID).\"),\n",
+ " validation_error STRING OPTIONS(description=\"Detailed reason if the route failed validation.\"),\n",
+ " low_road_usage_start_time TIMESTAMP OPTIONS(description=\"Timestamp when low road usage was first detected.\"),\n",
+ " route_attributes STRING OPTIONS(description=\"JSON string of custom business metadata.\")\n",
+ ")\n",
+ "PARTITION BY DATE(snapshot_time)\n",
+ "CLUSTER BY selected_route_id;\n",
+ "\n",
+ "-- STEP 2: The Periodic Append Logic (Manually executable version)\n",
+ "-- This statement appends the current state of all routes into the history table.\n",
+ "INSERT INTO `your-project.your-dataset.routes_status_history`\n",
+ "SELECT\n",
+ " CURRENT_TIMESTAMP() as snapshot_time,\n",
+ " selected_route_id,\n",
+ " display_name,\n",
+ " status,\n",
+ " validation_error,\n",
+ " low_road_usage_start_time,\n",
+ " route_attributes\n",
+ "FROM `boston_oct_2025_sample_data.routes_status`;\n",
+ ""
+ ]
+ }
+ ],
+ "metadata": {
+ "colab": {
+ "provenance": []
+ },
+ "kernelspec": {
+ "display_name": "Python 3",
+ "name": "python3"
+ },
+ "language_info": {
+ "name": "python"
+ }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 0
+}
\ No newline at end of file
diff --git a/roads_management_insights/rmi_sample_queries/notebooks/Data_Scientist_Samples.ipynb b/roads_management_insights/rmi_sample_queries/notebooks/Data_Scientist_Samples.ipynb
new file mode 100644
index 0000000..254661b
--- /dev/null
+++ b/roads_management_insights/rmi_sample_queries/notebooks/Data_Scientist_Samples.ipynb
@@ -0,0 +1,663 @@
+{
+ "cells": [
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "```\n# Copyright 2026 Google LLC\n#\n# Licensed under the Apache License, Version 2.0 (the \"License\");\n# you may not use this file except in compliance with the License.\n# You may obtain a copy of the License at\n#\n# https://www.apache.org/licenses/LICENSE-2.0\n#\n# Unless required by applicable law or agreed to in writing, software\n# distributed under the License is distributed on an \"AS IS\" BASIS,\n# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n# See the License for the specific language governing permissions and\n# limitations under the License.\n```"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "# RMI Sample Queries: Data Scientist (GA)\n",
+ "\n",
+ "\n",
+ " \n",
+ " \n",
+ "  Open in Colab\n",
+ " \n",
+ " | \n",
+ " \n",
+ " \n",
+ "  Open in Colab Enterprise\n",
+ " \n",
+ " | \n",
+ " \n",
+ " \n",
+ "  Open in BigQuery Studio\n",
+ " \n",
+ " | \n",
+ " \n",
+ " \n",
+ "  View on GitHub\n",
+ " \n",
+ " | \n",
+ "
\n",
+ "\n"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "This notebook contains sample queries for the **Data Scientist** persona, specifically for the **GA** stage."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "## 1. Setup"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "from google.colab import auth\n",
+ "import pandas as pd\n",
+ "\n",
+ "auth.authenticate_user()\n",
+ "\n",
+ "project_id = 'your-project-id' #@param {type:\"string\"}"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "### Writable Dataset\n",
+ "\n",
+ "Several queries in this notebook (e.g., those creating Materialized Views, Models, or Views) require a **writable dataset** within your own project. \n",
+ "**Note**: The source `boston_oct_2025_sample_data` dataset is a read-only subscription and cannot be used to store new resources.\n",
+ "\n",
+ "Run the cell below to create a new dataset (e.g., `rmi_analysis`) in your project if you haven't already.\n",
+ "\n",
+ "**Important**: When running queries that create new BigQuery resources (e.g., tables, views, models) outside of these `%%bigquery` magic cells, remember to manually prepend the job ID with `msqlfactory--` for proper tracking. For example: `bq query --job_id=msqlfactory--your-descriptive-job-name ...`"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "dataset_id = \"rmi_analysis\" #@param {type:\"string\"}\n",
+ "! bq --location=US mk --dataset {project_id}:{dataset_id}"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "## GA (General Availability)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "### ds1_outlier_detection.sql\n",
+ "**Business Question**: Which travel time records for a specific route are statistical outliers?\nUse Case: Automatically flags anomalous data points that could indicate extreme traffic events or potential data collection issues.\nProduct Stage: GA\nEstimated Bytes Processed: ~151 MB (Requires JOIN with routes_status)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "%%bigquery --project {project_id} df_ds1_outlier_detection\n",
+ "/*\n",
+ " QUALITY FILTERS:\n",
+ " 1. continuous_path: Excludes records where the geometry is not a single ST_LineString.\n",
+ " 2. length_integrity: Excludes records where actual physical length deviates by > 5% \n",
+ " from the intended 'route_length' attribute.\n",
+ "*/\n",
+ "\n",
+ "WITH quality_filtered_history AS (\n",
+ " SELECT\n",
+ " h.selected_route_id,\n",
+ " h.record_time,\n",
+ " h.duration_in_seconds,\n",
+ " ST_LENGTH(h.route_geometry) as actual_length,\n",
+ " CAST(JSON_VALUE(s.route_attributes, '$.route_length') AS FLOAT64) as intended_length\n",
+ " FROM `boston_oct_2025_sample_data.historical_travel_time` h\n",
+ " JOIN `boston_oct_2025_sample_data.routes_status` s USING(selected_route_id)\n",
+ " WHERE h.selected_route_id = 'route-4202493217'\n",
+ " AND h.record_time BETWEEN '2025-10-01' AND '2025-11-01'\n",
+ " -- Quality filter: Only process single, continuous paths\n",
+ " AND ST_GEOMETRYTYPE(h.route_geometry) = 'ST_LineString'\n",
+ " -- Quality filter: Length deviation check (< 5%)\n",
+ " AND SAFE_DIVIDE(ABS(ST_LENGTH(h.route_geometry) - CAST(JSON_VALUE(s.route_attributes, '$.route_length') AS FLOAT64)), CAST(JSON_VALUE(s.route_attributes, '$.route_length') AS FLOAT64)) < 0.05\n",
+ "),\n",
+ "stats AS (\n",
+ " SELECT\n",
+ " APPROX_QUANTILES(duration_in_seconds, 100)[OFFSET(25)] AS q1,\n",
+ " APPROX_QUANTILES(duration_in_seconds, 100)[OFFSET(75)] AS q3\n",
+ " FROM quality_filtered_history\n",
+ "),\n",
+ "outlier_thresholds AS (\n",
+ " SELECT\n",
+ " q1,\n",
+ " q3,\n",
+ " (q3 - q1) AS iqr,\n",
+ " q1 - (1.5 * (q3 - q1)) AS lower_bound,\n",
+ " q3 + (1.5 * (q3 - q1)) AS upper_bound\n",
+ " FROM stats\n",
+ ")\n",
+ "SELECT\n",
+ " h.record_time,\n",
+ " h.duration_in_seconds,\n",
+ " t.lower_bound,\n",
+ " t.upper_bound,\n",
+ " CASE \n",
+ " WHEN h.duration_in_seconds > t.upper_bound THEN 'High_Outlier'\n",
+ " WHEN h.duration_in_seconds < t.lower_bound THEN 'Low_Outlier'\n",
+ " END as outlier_type\n",
+ "FROM quality_filtered_history h, outlier_thresholds t\n",
+ "-- Filter for records outside the calculated IQR bounds\n",
+ "WHERE (h.duration_in_seconds > t.upper_bound OR h.duration_in_seconds < t.lower_bound)\n",
+ "ORDER BY h.record_time DESC;\n",
+ ""
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "### ds2_similarity_clustering.sql\n",
+ "**Business Question**: Which routes exhibit similar traffic patterns based on their average peak-hour delay ratios?\nUse Case: Grouping routes by behavioral similarity allows planners to apply similar mitigation strategies to entire clusters of road segments rather than analyzing each route individually.\nProduct Stage: GA\nEstimated Bytes Processed: ~150 MB"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "%%bigquery --project {project_id} df_ds2_similarity_clustering\n",
+ "/*\n",
+ " INTERPRETATION GUIDE:\n",
+ " Routes assigned to the same 'cluster_id' share a similar diurnal traffic profile \n",
+ " (the relationship between AM, Midday, and PM delays).\n",
+ " \n",
+ " Example Interpretation:\n",
+ " - Cluster 1: Commuter Heavy (High AM/PM delay, low Midday).\n",
+ " - Cluster 2: Consistently Efficient (Delay ratio near 1.0 all day).\n",
+ " - Cluster 3: Midday Bottleneck (High Midday delay, typical AM/PM).\n",
+ "*/\n",
+ "\n",
+ "-- Step 1: Create the K-Means model.\n",
+ "-- NOTE: The source dataset (e.g., `boston_oct_2025_sample_data`) is a read-only subscription.\n",
+ "-- This model MUST be created in a separate, writable dataset within your project.\n",
+ "-- Replace `your-project.your-dataset` with your target location.\n",
+ "\n",
+ "CREATE OR REPLACE MODEL `your-project.your-dataset.route_clusters`\n",
+ "OPTIONS(model_type='kmeans', num_clusters=5) AS\n",
+ "SELECT\n",
+ " -- K-Means works with numerical features. We will use the delay ratios as features.\n",
+ " COALESCE(AVG(CASE WHEN EXTRACT(HOUR FROM DATETIME(record_time, 'America/New_York')) BETWEEN 7 AND 9 THEN SAFE_DIVIDE(duration_in_seconds, static_duration_in_seconds) END), 1.0) AS avg_am_delay,\n",
+ " COALESCE(AVG(CASE WHEN EXTRACT(HOUR FROM DATETIME(record_time, 'America/New_York')) BETWEEN 12 AND 14 THEN SAFE_DIVIDE(duration_in_seconds, static_duration_in_seconds) END), 1.0) AS avg_midday_delay,\n",
+ " COALESCE(AVG(CASE WHEN EXTRACT(HOUR FROM DATETIME(record_time, 'America/New_York')) BETWEEN 16 AND 18 THEN SAFE_DIVIDE(duration_in_seconds, static_duration_in_seconds) END), 1.0) AS avg_pm_delay\n",
+ "FROM `boston_oct_2025_sample_data.historical_travel_time`\n",
+ "WHERE record_time BETWEEN '2025-10-01' AND '2025-11-01'\n",
+ "GROUP BY selected_route_id;\n",
+ "\n",
+ "-- Step 2: Predict the cluster for each route using the trained model.\n",
+ "WITH route_features AS (\n",
+ " SELECT\n",
+ " selected_route_id,\n",
+ " display_name,\n",
+ " COALESCE(AVG(CASE WHEN EXTRACT(HOUR FROM DATETIME(record_time, 'America/New_York')) BETWEEN 7 AND 9 THEN SAFE_DIVIDE(duration_in_seconds, static_duration_in_seconds) END), 1.0) AS avg_am_delay,\n",
+ " COALESCE(AVG(CASE WHEN EXTRACT(HOUR FROM DATETIME(record_time, 'America/New_York')) BETWEEN 12 AND 14 THEN SAFE_DIVIDE(duration_in_seconds, static_duration_in_seconds) END), 1.0) AS avg_midday_delay,\n",
+ " COALESCE(AVG(CASE WHEN EXTRACT(HOUR FROM DATETIME(record_time, 'America/New_York')) BETWEEN 16 AND 18 THEN SAFE_DIVIDE(duration_in_seconds, static_duration_in_seconds) END), 1.0) AS avg_pm_delay\n",
+ " FROM `boston_oct_2025_sample_data.historical_travel_time`\n",
+ " WHERE record_time BETWEEN '2025-10-01' AND '2025-11-01'\n",
+ " GROUP BY 1, 2\n",
+ ")\n",
+ "SELECT\n",
+ " selected_route_id,\n",
+ " display_name,\n",
+ " CENTROID_ID AS cluster_id,\n",
+ " ROUND(avg_am_delay, 2) as am_ratio,\n",
+ " ROUND(avg_midday_delay, 2) as midday_ratio,\n",
+ " ROUND(avg_pm_delay, 2) as pm_ratio\n",
+ "FROM ML.PREDICT(MODEL `your-project.your-dataset.route_clusters`,\n",
+ " (SELECT * FROM route_features)\n",
+ ")\n",
+ "ORDER BY cluster_id, selected_route_id;\n",
+ ""
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "### ds3_feature_engineering.sql\n",
+ "**Business Question**: How can I prepare a high-quality, gap-aware feature set for training a predictive traffic model?\nUse Case: Demonstrates how to regularize a time-series using a timestamp grid. This ensures that window functions (LAG, AVG) accurately reflect chronological time even when records are missing due to quality filtering or detours.\nProduct Stage: GA\nEstimated Bytes Processed: ~151 MB"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "%%bigquery --project {project_id} df_ds3_feature_engineering\n",
+ "/*\n",
+ " HANDLING MISSING DATA (DETOURS/GAPS):\n",
+ " By joining the RMI data with a generated 'time_grid', we identify missing records. \n",
+ " Downstream models can then decide how to handle these nulls (e.g., interpolation, \n",
+ " imputation, or masking), preventing window functions from 'collapsing' time gaps.\n",
+ "*/\n",
+ "\n",
+ "WITH quality_filtered_base AS (\n",
+ " SELECT\n",
+ " -- Truncating to hour to match the RMI collection interval\n",
+ " TIMESTAMP_TRUNC(h.record_time, HOUR) as record_hour,\n",
+ " h.duration_in_seconds,\n",
+ " ST_LENGTH(h.route_geometry) as actual_length,\n",
+ " CAST(JSON_VALUE(s.route_attributes, '$.route_length') AS FLOAT64) as intended_length\n",
+ " FROM `boston_oct_2025_sample_data.historical_travel_time` h\n",
+ " JOIN `boston_oct_2025_sample_data.routes_status` s USING(selected_route_id)\n",
+ " WHERE h.selected_route_id = 'route-4202493217'\n",
+ " AND h.record_time BETWEEN '2025-10-01' AND '2025-11-01'\n",
+ " -- Quality filter: Only process single, continuous paths\n",
+ " AND ST_GEOMETRYTYPE(h.route_geometry) = 'ST_LineString'\n",
+ "),\n",
+ "hourly_averages AS (\n",
+ " -- Aggregate to a single record per hour before regularizing\n",
+ " SELECT \n",
+ " record_hour,\n",
+ " AVG(duration_in_seconds) as avg_duration,\n",
+ " COUNT(*) as samples_in_hour\n",
+ " FROM quality_filtered_base\n",
+ " WHERE SAFE_DIVIDE(ABS(actual_length - intended_length), intended_length) < 0.05\n",
+ " GROUP BY 1\n",
+ "),\n",
+ "time_grid AS (\n",
+ " -- Generate a continuous hourly grid for the study period\n",
+ " SELECT hour\n",
+ " FROM UNNEST(GENERATE_TIMESTAMP_ARRAY('2025-10-01', '2025-10-31', INTERVAL 1 HOUR)) as hour\n",
+ "),\n",
+ "regularized_series AS (\n",
+ " SELECT\n",
+ " g.hour,\n",
+ " a.avg_duration as duration_in_seconds,\n",
+ " COALESCE(a.samples_in_hour, 0) as samples_in_hour,\n",
+ " IF(a.avg_duration IS NULL, TRUE, FALSE) as is_missing_data\n",
+ " FROM time_grid g\n",
+ " LEFT JOIN hourly_averages a ON g.hour = a.record_hour\n",
+ ")\n",
+ "SELECT\n",
+ " hour,\n",
+ " ROUND(duration_in_seconds, 2) as duration_in_seconds,\n",
+ " samples_in_hour,\n",
+ " is_missing_data,\n",
+ " -- Lagged features now accurately represent -1hr and -2hr regardless of data availability\n",
+ " ROUND(LAG(duration_in_seconds, 1) OVER (ORDER BY hour), 2) AS lag_1hr_duration,\n",
+ " ROUND(LAG(duration_in_seconds, 2) OVER (ORDER BY hour), 2) AS lag_2hr_duration,\n",
+ " -- Rolling average (3-hour window)\n",
+ " ROUND(AVG(duration_in_seconds) OVER (\n",
+ " ORDER BY hour \n",
+ " ROWS BETWEEN 2 PRECEDING AND CURRENT ROW\n",
+ " ), 2) AS rolling_avg_3hr\n",
+ "FROM regularized_series\n",
+ "ORDER BY hour DESC;\n",
+ ""
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "### ds4_route_integrity_audit.sql\n",
+ "**Business Question**: When and for how long did specific routes experience extreme geometry deviations?\nUse Case: Identifies persistent \"integrity incidents\" rather than transient noise. By grouping consecutive failed records into windows, engineers can correlate failures with specific infrastructure changes, GPS outages, or registration updates.\nProduct Stage: GA\nEstimated Bytes Processed: ~151 MB (Requires JOIN with routes_status)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "%%bigquery --project {project_id} df_ds4_route_integrity_audit\n",
+ "/*\n",
+ " DEFINITION: Route Integrity\n",
+ " In RMI, 'Route Integrity' measures the spatial consistency between a route's \n",
+ " registered definition and its actual data collection performance.\n",
+ " \n",
+ " - The Baseline: 'intended_length' (meters) provided as a custom attribute during registration.\n",
+ " - The Signal: 'actual_length' (meters) calculated from the captured ST_LineString.\n",
+ " - High Integrity: A ratio near 1.0 (actual length matches intended definition).\n",
+ " - Low Integrity: Significant deviations (> 10%) indicate detours, missing road \n",
+ " segments, or incorrect metadata registration.\n",
+ "*/\n",
+ "\n",
+ "/*\n",
+ " ANALYTICAL PATTERN: Islands and Gaps\n",
+ " This query groups consecutive records that exceed a 10% length deviation threshold \n",
+ " into discrete failure windows.\n",
+ "*/\n",
+ "\n",
+ "WITH base_comparison AS (\n",
+ " SELECT\n",
+ " h.selected_route_id,\n",
+ " h.display_name,\n",
+ " h.record_time,\n",
+ " ST_LENGTH(h.route_geometry) AS actual_length,\n",
+ " CAST(JSON_VALUE(s.route_attributes, '$.route_length') AS FLOAT64) AS intended_length\n",
+ " FROM `boston_oct_2025_sample_data.historical_travel_time` h\n",
+ " JOIN `boston_oct_2025_sample_data.routes_status` s USING (selected_route_id)\n",
+ " WHERE s.status = 'STATUS_RUNNING'\n",
+ " AND h.record_time BETWEEN '2025-10-01' AND '2025-11-01'\n",
+ " -- Quality filter: Exclude non-continuous geometries\n",
+ " AND ST_GEOMETRYTYPE(h.route_geometry) = 'ST_LineString'\n",
+ "),\n",
+ "outlier_flagging AS (\n",
+ " SELECT\n",
+ " *,\n",
+ " -- Flag if deviation exceeds 10% (actual / intended)\n",
+ " IF(intended_length IS NOT NULL AND (SAFE_DIVIDE(actual_length, intended_length) > 1.1 OR SAFE_DIVIDE(actual_length, intended_length) < 0.9), 1, 0) as is_outlier\n",
+ " FROM base_comparison\n",
+ "),\n",
+ "streak_identification AS (\n",
+ " SELECT\n",
+ " *,\n",
+ " -- A new streak starts if this is an outlier and the previous record (by time) wasn't\n",
+ " IF(is_outlier = 1 AND LAG(is_outlier) OVER (PARTITION BY selected_route_id ORDER BY record_time) = 0, 1, \n",
+ " IF(is_outlier = 1 AND LAG(is_outlier) OVER (PARTITION BY selected_route_id ORDER BY record_time) IS NULL, 1, 0)) as is_streak_start\n",
+ " FROM outlier_flagging\n",
+ "),\n",
+ "streak_grouping AS (\n",
+ " SELECT\n",
+ " *,\n",
+ " -- Cumulative sum of starts creates a unique ID for each failure window\n",
+ " SUM(is_streak_start) OVER (PARTITION BY selected_route_id ORDER BY record_time) as streak_id\n",
+ " FROM streak_identification\n",
+ " WHERE is_outlier = 1\n",
+ ")\n",
+ "SELECT\n",
+ " selected_route_id,\n",
+ " display_name,\n",
+ " MIN(record_time) as failure_start,\n",
+ " MAX(record_time) as failure_end,\n",
+ " COUNT(*) as consecutive_records,\n",
+ " -- Average ratio across the window: Severity of the discrepancy\n",
+ " ROUND(AVG(SAFE_DIVIDE(actual_length, intended_length)), 2) as avg_deviation_ratio,\n",
+ " -- Identify if the deviation is an over-count (likely detour) or under-count (missing segments)\n",
+ " IF(AVG(actual_length) > AVG(intended_length), 'OVER_COUNT', 'UNDER_COUNT') as failure_type\n",
+ "FROM streak_grouping\n",
+ "GROUP BY selected_route_id, display_name, streak_id\n",
+ "-- Focus on persistent issues\n",
+ "HAVING consecutive_records >= 1\n",
+ "ORDER BY failure_start DESC, avg_deviation_ratio DESC\n",
+ "LIMIT 50;\n",
+ ""
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "### ds5_reliability_ranking.sql\n",
+ "**Business Question**: When and for how long did specific routes experience persistent travel time spikes?\nUse Case: Identifies chronic congestion incidents rather than transient variance. By grouping consecutive \"slow\" records into windows, operators can distinguish between random noise and actionable infrastructure failures or major events.\nProduct Stage: GA\nEstimated Bytes Processed: ~151 MB (Requires JOIN with routes_status)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "%%bigquery --project {project_id} df_ds5_reliability_ranking\n",
+ "/*\n",
+ " DEFINITION: Route Reliability vs. Route Integrity\n",
+ " - Route Integrity (DS4): Measures spatial consistency (Actual Geometry vs. Registered Definition).\n",
+ " - Route Reliability (DS5): Measures temporal performance (Actual Travel Time vs. Free-flow Baseline).\n",
+ " \n",
+ " High Reliability means a route's travel time is stable and near its ideal baseline. \n",
+ " Low Reliability (this query) indicates persistent periods of 'excess delay' \n",
+ " where actual travel times significantly exceed free-flow estimates.\n",
+ "*/\n",
+ "\n",
+ "/*\n",
+ " ANALYTICAL PATTERN: Reliability Gaps (Islands and Gaps)\n",
+ " 1. Calculate a historical baseline per route.\n",
+ " 2. Flag records where travel time exceeds a 'significant delay' threshold (e.g., 1.5x baseline).\n",
+ " 3. Group consecutive flags into discrete failure windows (streaks).\n",
+ "*/\n",
+ "\n",
+ "WITH quality_filtered_history AS (\n",
+ " -- Standard quality filtering to ensure we analyze healthy geometries\n",
+ " SELECT\n",
+ " h.selected_route_id,\n",
+ " h.display_name,\n",
+ " h.record_time,\n",
+ " h.duration_in_seconds,\n",
+ " h.static_duration_in_seconds\n",
+ " FROM `boston_oct_2025_sample_data.historical_travel_time` h\n",
+ " JOIN `boston_oct_2025_sample_data.routes_status` s USING(selected_route_id)\n",
+ " WHERE h.record_time BETWEEN '2025-10-01' AND '2025-11-01'\n",
+ " AND ST_GEOMETRYTYPE(h.route_geometry) = 'ST_LineString'\n",
+ " AND SAFE_DIVIDE(ABS(ST_LENGTH(h.route_geometry) - CAST(JSON_VALUE(s.route_attributes, '$.route_length') AS FLOAT64)), CAST(JSON_VALUE(s.route_attributes, '$.route_length') AS FLOAT64)) < 0.05\n",
+ "),\n",
+ "incident_flagging AS (\n",
+ " SELECT\n",
+ " *,\n",
+ " -- Threshold: Travel time is more than 50% above the static (free-flow) baseline\n",
+ " IF(SAFE_DIVIDE(duration_in_seconds, static_duration_in_seconds) > 1.5, 1, 0) as is_incident\n",
+ " FROM quality_filtered_history\n",
+ "),\n",
+ "streak_identification AS (\n",
+ " SELECT\n",
+ " *,\n",
+ " -- Identify the start of a new consecutive incident window\n",
+ " IF(is_incident = 1 AND LAG(is_incident) OVER (PARTITION BY selected_route_id ORDER BY record_time) = 0, 1, \n",
+ " IF(is_incident = 1 AND LAG(is_incident) OVER (PARTITION BY selected_route_id ORDER BY record_time) IS NULL, 1, 0)) as is_streak_start\n",
+ " FROM incident_flagging\n",
+ "),\n",
+ "streak_grouping AS (\n",
+ " SELECT\n",
+ " *,\n",
+ " -- Cumulative sum of starts creates a unique ID for each incident window\n",
+ " SUM(is_streak_start) OVER (PARTITION BY selected_route_id ORDER BY record_time) as streak_id\n",
+ " FROM streak_identification\n",
+ " WHERE is_incident = 1\n",
+ ")\n",
+ "SELECT\n",
+ " selected_route_id,\n",
+ " display_name,\n",
+ " MIN(record_time) as incident_start,\n",
+ " MAX(record_time) as incident_end,\n",
+ " -- Total consecutive records in this incident window\n",
+ " COUNT(*) as consecutive_samples,\n",
+ " -- Average severity of the delay during this window\n",
+ " ROUND(AVG(SAFE_DIVIDE(duration_in_seconds, static_duration_in_seconds)), 2) as avg_delay_ratio\n",
+ "FROM streak_grouping\n",
+ "GROUP BY selected_route_id, display_name, streak_id\n",
+ "-- Focus on persistent unreliability (lasting at least 3 samples)\n",
+ "HAVING consecutive_samples >= 3\n",
+ "ORDER BY avg_delay_ratio DESC, incident_start DESC\n",
+ "LIMIT 50;\n",
+ ""
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "### ds6_travel_time_forecasting.sql\n",
+ "**Business Question**: Can we predict next week's peak travel times based on the last 21 days of history?\nUse Case: Demonstrates a complete predictive workflow: Training an ARIMA_PLUS model, evaluating its seasonal fit, and performing backtesting against actual results.\nProduct Stage: GA (Uses BigQuery ML)\nEstimated Bytes Processed: ~150 MB"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "%%bigquery --project {project_id} df_ds6_travel_time_forecasting\n",
+ "/*\n",
+ " METHODOLOGY: TIME-SERIES BACKTESTING\n",
+ " To build trust in a traffic model, we use 'Backtesting'. We split our 31-day \n",
+ " sample dataset into two parts:\n",
+ " 1. Training Set (Weeks 1-3): The model 'learns' the route's diurnal and weekly rhythm.\n",
+ " 2. Validation Set (Week 4): We withhold this data from the model, then ask the \n",
+ " model to 'forecast' it. Comparing the forecast to reality gives us an \n",
+ " empirical accuracy score.\n",
+ "*/\n",
+ "\n",
+ "/*\n",
+ " INTERPRETATION & VISUALIZATION GUIDE:\n",
+ " \n",
+ " 1. REPORT INTERPRETATION:\n",
+ " - 'absolute_error': Smaller is better. Measures the magnitude of the prediction 'miss'.\n",
+ " - 'within_confidence_interval': This is your 'Anomaly Signal'. \n",
+ " - 'YES': Traffic is behaving normally/predictably.\n",
+ " - 'NO': A significant event occurred (accident, weather, gridlock) that \n",
+ " exceeded statistical expectations. This is the trigger for operational alerts.\n",
+ " \n",
+ " 2. RECOMMENDED VISUALIZATIONS:\n",
+ " - Time-Series Line: Plot 'forecast_seconds' and 'actual_seconds' on the same Y-axis.\n",
+ " - Confidence Band: Plot 'lower_bound' and 'upper_bound' as a shaded area. Dots \n",
+ " (actuals) falling outside this band are your truly actionable traffic incidents.\n",
+ "*/\n",
+ "\n",
+ "-- STEP 1: Train the ARIMA_PLUS model using a 3-week window.\n",
+ "-- We use hourly aggregation (AVG) to regularize the input for the ARIMA algorithm.\n",
+ "CREATE OR REPLACE MODEL `your-project.your-dataset.travel_time_forecast_model`\n",
+ "OPTIONS(\n",
+ " model_type='ARIMA_PLUS',\n",
+ " time_series_timestamp_col='record_hour',\n",
+ " time_series_data_col='duration_in_seconds',\n",
+ " auto_arima=TRUE, -- Automatically finds the best P, D, Q parameters.\n",
+ " data_frequency='HOURLY',\n",
+ " clean_spikes_and_dips=TRUE -- Prevents one-off accidents from skewing the long-term trend.\n",
+ ") AS\n",
+ "SELECT\n",
+ " TIMESTAMP_TRUNC(record_time, HOUR) as record_hour,\n",
+ " AVG(duration_in_seconds) as duration_in_seconds\n",
+ "FROM `boston_oct_2025_sample_data.historical_travel_time`\n",
+ "WHERE selected_route_id = 'route-4202493217'\n",
+ " AND record_time BETWEEN '2025-10-01' AND '2025-10-21'\n",
+ "GROUP BY 1;\n",
+ "\n",
+ "-- STEP 2: Evaluate the model's training metrics.\n",
+ "-- This returns AIC, Log Likelihood, and identified seasonal periods (e.g., DAILY).\n",
+ "-- A low AIC relative to other models indicates a better fit.\n",
+ "SELECT * FROM ML.EVALUATE(MODEL `your-project.your-dataset.travel_time_forecast_model`);\n",
+ "\n",
+ "-- STEP 3: Compare Forecast vs. Actual for the 4th week (Backtesting).\n",
+ "-- We forecast a 168-hour 'horizon' (7 full days) to match the final week of October.\n",
+ "WITH forecast_data AS (\n",
+ " SELECT\n",
+ " forecast_timestamp,\n",
+ " forecast_value as predicted_duration,\n",
+ " prediction_interval_lower_bound as lower_bound,\n",
+ " prediction_interval_upper_bound as upper_bound\n",
+ " FROM ML.FORECAST(MODEL `your-project.your-dataset.travel_time_forecast_model`,\n",
+ " STRUCT(168 AS horizon, 0.9 AS confidence_level))\n",
+ "),\n",
+ "actual_data AS (\n",
+ " -- Aggregate actual withheld data to the same hourly grid for comparison.\n",
+ " SELECT\n",
+ " TIMESTAMP_TRUNC(record_time, HOUR) as record_hour,\n",
+ " AVG(duration_in_seconds) as actual_duration\n",
+ " FROM `boston_oct_2025_sample_data.historical_travel_time`\n",
+ " WHERE selected_route_id = 'route-4202493217'\n",
+ " AND record_time BETWEEN '2025-10-22' AND '2025-10-29'\n",
+ " GROUP BY 1\n",
+ ")\n",
+ "SELECT\n",
+ " f.forecast_timestamp,\n",
+ " ROUND(f.predicted_duration, 1) as forecast_seconds,\n",
+ " ROUND(a.actual_duration, 1) as actual_seconds,\n",
+ " -- absolute_error: How many seconds off was the prediction?\n",
+ " ROUND(ABS(f.predicted_duration - a.actual_duration), 1) as absolute_error,\n",
+ " -- within_confidence_interval: Was reality within the 90% expected range?\n",
+ " IF(a.actual_duration BETWEEN f.lower_bound AND f.upper_bound, 'YES', 'NO') as within_confidence_interval,\n",
+ " -- Include bounds for visualization in tools like Looker Studio or Colab.\n",
+ " ROUND(f.lower_bound, 1) as lower_bound,\n",
+ " ROUND(f.upper_bound, 1) as upper_bound\n",
+ "FROM forecast_data f\n",
+ "LEFT JOIN actual_data a ON f.forecast_timestamp = a.record_hour\n",
+ "WHERE a.actual_duration IS NOT NULL\n",
+ "ORDER BY f.forecast_timestamp;\n",
+ ""
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "### ds7_zero_shot_forecasting.sql\n",
+ "**Business Question**: Can we immediately forecast next-day traffic for multiple routes without waiting to train individual models?\nUse Case: Demonstrates 'Zero-Shot' forecasting using Google's Time Series Foundation Model (TimesFM). Unlike ARIMA, this model uses pre-trained patterns to predict future travel times for an entire cluster of routes simultaneously, even with limited local history.\nProduct Stage: GA (Uses AI.FORECAST with TimesFM)\nEstimated Bytes Processed: ~150 MB"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "%%bigquery --project {project_id} df_ds7_zero_shot_forecasting\n",
+ "/*\n",
+ " ANALYTICAL ADVANTAGE: Foundation Models vs. Traditional Models\n",
+ " - ARIMA_PLUS (DS6): Requires 'Training' (Learning) on specific route history first.\n",
+ " - TimesFM (DS7): Uses 'Zero-Shot' inference via AI.FORECAST. It applies global \n",
+ " patterns to your data immediately. Ideal for 'Cold Start' (new routes) or \n",
+ " scaling to thousands of routes without per-route training overhead.\n",
+ "*/\n",
+ "\n",
+ "-- STEP 1: Prepare a 'Context' window of history for multiple routes.\n",
+ "-- Foundation models like TimesFM perform best with 3-7 days of chronological context.\n",
+ "WITH route_context AS (\n",
+ " SELECT\n",
+ " selected_route_id,\n",
+ " TIMESTAMP_TRUNC(record_time, HOUR) as record_hour,\n",
+ " AVG(duration_in_seconds) as duration_in_seconds\n",
+ " FROM `boston_oct_2025_sample_data.historical_travel_time`\n",
+ " -- We pick a 7-day context window for 3 specific routes\n",
+ " WHERE selected_route_id IN ('route-4202493217', 'route-3850158153', 'route-381361371')\n",
+ " AND record_time BETWEEN '2025-10-14' AND '2025-10-21'\n",
+ " GROUP BY 1, 2\n",
+ ")\n",
+ "-- STEP 2: Use AI.FORECAST to generate predictions.\n",
+ "-- Note: TimesFM is a managed foundation model; no CREATE MODEL is required.\n",
+ "SELECT\n",
+ " *\n",
+ "FROM AI.FORECAST(\n",
+ " TABLE route_context,\n",
+ " data_col => 'duration_in_seconds',\n",
+ " timestamp_col => 'record_hour',\n",
+ " model => 'TimesFM 2.0', -- Specify the foundation model version\n",
+ " id_cols => ['selected_route_id'], -- Forecast each route independently\n",
+ " horizon => 24, -- Forecast 24 hours ahead\n",
+ " confidence_level => 0.9 -- Generate 90% confidence intervals\n",
+ ")\n",
+ "ORDER BY selected_route_id, forecast_timestamp;\n",
+ ""
+ ]
+ }
+ ],
+ "metadata": {
+ "colab": {
+ "provenance": []
+ },
+ "kernelspec": {
+ "display_name": "Python 3",
+ "name": "python3"
+ },
+ "language_info": {
+ "name": "python"
+ }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 0
+}
\ No newline at end of file
diff --git a/roads_management_insights/rmi_sample_queries/notebooks/RMI_Planner_Samples.ipynb b/roads_management_insights/rmi_sample_queries/notebooks/RMI_Planner_Samples.ipynb
new file mode 100644
index 0000000..ce422cf
--- /dev/null
+++ b/roads_management_insights/rmi_sample_queries/notebooks/RMI_Planner_Samples.ipynb
@@ -0,0 +1,280 @@
+{
+ "cells": [
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "```\n# Copyright 2026 Google LLC\n#\n# Licensed under the Apache License, Version 2.0 (the \"License\");\n# you may not use this file except in compliance with the License.\n# You may obtain a copy of the License at\n#\n# https://www.apache.org/licenses/LICENSE-2.0\n#\n# Unless required by applicable law or agreed to in writing, software\n# distributed under the License is distributed on an \"AS IS\" BASIS,\n# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n# See the License for the specific language governing permissions and\n# limitations under the License.\n```"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "# RMI Sample Queries: RMI Planner (GA)\n",
+ "\n",
+ "\n",
+ " \n",
+ " \n",
+ "  Open in Colab\n",
+ " \n",
+ " | \n",
+ " \n",
+ " \n",
+ "  Open in Colab Enterprise\n",
+ " \n",
+ " | \n",
+ " \n",
+ " \n",
+ "  Open in BigQuery Studio\n",
+ " \n",
+ " | \n",
+ " \n",
+ " \n",
+ "  View on GitHub\n",
+ " \n",
+ " | \n",
+ "
\n",
+ "\n"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "This notebook contains sample queries for the **RMI Planner** persona, specifically for the **GA** stage."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "## 1. Setup"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "from google.colab import auth\n",
+ "import pandas as pd\n",
+ "\n",
+ "auth.authenticate_user()\n",
+ "\n",
+ "project_id = 'your-project-id' #@param {type:\"string\"}"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "### Writable Dataset\n",
+ "\n",
+ "Several queries in this notebook (e.g., those creating Materialized Views, Models, or Views) require a **writable dataset** within your own project. \n",
+ "**Note**: The source `boston_oct_2025_sample_data` dataset is a read-only subscription and cannot be used to store new resources.\n",
+ "\n",
+ "Run the cell below to create a new dataset (e.g., `rmi_analysis`) in your project if you haven't already.\n",
+ "\n",
+ "**Important**: When running queries that create new BigQuery resources (e.g., tables, views, models) outside of these `%%bigquery` magic cells, remember to manually prepend the job ID with `msqlfactory--` for proper tracking. For example: `bq query --job_id=msqlfactory--your-descriptive-job-name ...`"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "dataset_id = \"rmi_analysis\" #@param {type:\"string\"}\n",
+ "! bq --location=US mk --dataset {project_id}:{dataset_id}"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "## GA (General Availability)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "### rmip1_usage_projection.sql\n",
+ "**Business Question**: Based on current data, what is the rate of record creation, and how will it scale?\nUse Case: Helps sales teams estimate BigQuery storage and compute growth as a customer increases their monitoring footprint from a small pilot to an enterprise-wide fleet.\nProduct Stage: GA\nEstimated Bytes Processed: < 1 MB (Standard SQL on RMI Tables)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "%%bigquery --project {project_id} df_rmip1_usage_projection\n",
+ "WITH daily_stats AS (\n",
+ " SELECT\n",
+ " DATE(record_time) as log_date,\n",
+ " selected_route_id,\n",
+ " COUNT(*) as records_per_day\n",
+ " FROM `boston_oct_2025_sample_data.historical_travel_time`\n",
+ " WHERE record_time BETWEEN '2025-10-01' AND '2025-11-01'\n",
+ " GROUP BY 1, 2\n",
+ "),\n",
+ "avg_usage AS (\n",
+ " SELECT\n",
+ " AVG(records_per_day) as avg_daily_records_per_route\n",
+ " FROM daily_stats\n",
+ ")\n",
+ "SELECT\n",
+ " ROUND(avg_daily_records_per_route, 2) as avg_records_per_route_per_day,\n",
+ " -- Parameter: Target fleet size (e.g., 5,000 routes)\n",
+ " 5000 as target_route_count,\n",
+ " ROUND(avg_daily_records_per_route * 5000, 0) as estimated_total_daily_records,\n",
+ " -- Extrapolate to monthly volume in millions of records\n",
+ " ROUND(avg_daily_records_per_route * 5000 * 30 / 1000000, 2) as estimated_monthly_millions\n",
+ "FROM avg_usage;\n",
+ ""
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "### rmip2_customer_roi.sql\n",
+ "**Business Question**: How much total time is lost to congestion across different customer service tiers?\nUse Case: Translates raw traffic data into \"Business Value\" by quantifying the potential time savings for priority routes, justifying the monitoring cost and providing a clear ROI for the customer.\nProduct Stage: GA\nEstimated Bytes Processed: < 1 MB (Standard SQL on RMI Tables)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "%%bigquery --project {project_id} df_rmip2_customer_roi\n",
+ "SELECT\n",
+ " JSON_EXTRACT_SCALAR(route_attributes, '$.tier') as service_tier,\n",
+ " -- Aggregate total lost time (Actual - Free-flow) converted to hours\n",
+ " ROUND(SUM(duration_in_seconds - static_duration_in_seconds) / 3600, 1) as total_delay_hours,\n",
+ " COUNT(DISTINCT h.selected_route_id) as monitored_routes,\n",
+ " -- Average performance multiplier\n",
+ " ROUND(AVG(SAFE_DIVIDE(duration_in_seconds, static_duration_in_seconds)), 2) as avg_delay_index\n",
+ "FROM `boston_oct_2025_sample_data.historical_travel_time` h\n",
+ "JOIN `boston_oct_2025_sample_data.routes_status` s ON h.selected_route_id = s.selected_route_id\n",
+ "WHERE h.record_time BETWEEN '2025-10-01' AND '2025-11-01'\n",
+ " -- Filter for records where actual was slower than free-flow\n",
+ " AND (duration_in_seconds - static_duration_in_seconds) > 0\n",
+ "GROUP BY 1\n",
+ "HAVING service_tier IS NOT NULL\n",
+ "ORDER BY total_delay_hours DESC;\n",
+ ""
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "### rmip3_segment_estimation.sql\n",
+ "**Business Question**: How many physical road segments exist in our target study area, categorized by class?\nUse Case: Helps sales and solution architects estimate the \"Total Addressable Monitoring\" footprint for a city, aiding in pricing and coverage strategy.\nProduct Stage: GA\nEstimated Bytes Processed: ~1 MB (Uses BigQuery Public Dataset: Overture Maps)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "%%bigquery --project {project_id} df_rmip3_segment_estimation\n",
+ "/*\n",
+ " EXTERNAL DEPENDENCY: \n",
+ " RMI monitoring is based on user-defined routes. To understand the underlying \n",
+ " physical scale of an area, this query joins with the Overture Maps public \n",
+ " dataset to provide a baseline count of all physical road segments.\n",
+ "*/\n",
+ "\n",
+ "WITH target_boundary AS (\n",
+ " SELECT geometry\n",
+ " FROM `bigquery-public-data.overture_maps.division_area`\n",
+ " WHERE names.primary = 'Boston' \n",
+ " AND country = 'US' \n",
+ " AND region = 'US-MA'\n",
+ " AND class = 'land'\n",
+ ")\n",
+ "SELECT\n",
+ " -- Group by physical road classification (e.g., motorway, primary, local)\n",
+ " class as road_class,\n",
+ " subtype,\n",
+ " COUNT(*) as segment_count,\n",
+ " ROUND(SUM(ST_LENGTH(s.geometry)) / 1000, 2) as total_length_km\n",
+ "FROM `bigquery-public-data.overture_maps.segment` s\n",
+ "JOIN target_boundary b ON ST_INTERSECTS(s.geometry, b.geometry)\n",
+ "WHERE s.subtype = 'road'\n",
+ "GROUP BY 1, 2\n",
+ "ORDER BY segment_count DESC;\n",
+ ""
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "### rmip4_area_boundary.sql\n",
+ "**Business Question**: How can I create a reusable, open-source administrative boundary for my target study area?\nUse Case: Establishes a \"Master Boundary\" for a city or region using public data. This view can then be joined with RMI tables to automate geofencing and localized reporting.\nProduct Stage: GA\nEstimated Bytes Processed: ~1 MB (Uses BigQuery Public Dataset: Overture Maps)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "%%bigquery --project {project_id} df_rmip4_area_boundary\n",
+ "/*\n",
+ " NOTE: This query creates a persistent view of a target city's official boundary.\n",
+ " The source dataset (e.g., `boston_oct_2025_sample_data`) is read-only.\n",
+ " This view MUST be created in a separate, writable dataset within your project.\n",
+ " Replace `your-project.your-dataset` with your target location.\n",
+ "*/\n",
+ "\n",
+ "CREATE OR REPLACE VIEW `your-project.your-dataset.target_area_boundary` \n",
+ "(\n",
+ " division_id OPTIONS(description=\"Stable identifier for the administrative division.\"),\n",
+ " area_name OPTIONS(description=\"The primary display name (e.g. Boston).\"),\n",
+ " region OPTIONS(description=\"The ISO state/province code (e.g. US-MA).\"),\n",
+ " country OPTIONS(description=\"The ISO country code.\"),\n",
+ " geometry OPTIONS(description=\"The physical land boundary of the division as a GEOGRAPHY polygon.\")\n",
+ ")\n",
+ "OPTIONS(\n",
+ " description=\"A reusable administrative boundary for geofencing RMI analytical assets.\"\n",
+ ")\n",
+ "AS\n",
+ "SELECT \n",
+ " id AS division_id,\n",
+ " names.primary AS area_name,\n",
+ " region,\n",
+ " country,\n",
+ " geometry\n",
+ "FROM `bigquery-public-data.overture_maps.division_area`\n",
+ "WHERE names.primary = 'Boston' \n",
+ " AND country = 'US' \n",
+ " AND region = 'US-MA'\n",
+ " AND class = 'land';\n",
+ ""
+ ]
+ }
+ ],
+ "metadata": {
+ "colab": {
+ "provenance": []
+ },
+ "kernelspec": {
+ "display_name": "Python 3",
+ "name": "python3"
+ },
+ "language_info": {
+ "name": "python"
+ }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 0
+}
\ No newline at end of file
diff --git a/roads_management_insights/rmi_sample_queries/notebooks/Traffic_Operations_Manager_Samples.ipynb b/roads_management_insights/rmi_sample_queries/notebooks/Traffic_Operations_Manager_Samples.ipynb
new file mode 100644
index 0000000..da2680d
--- /dev/null
+++ b/roads_management_insights/rmi_sample_queries/notebooks/Traffic_Operations_Manager_Samples.ipynb
@@ -0,0 +1,344 @@
+{
+ "cells": [
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "```\n# Copyright 2026 Google LLC\n#\n# Licensed under the Apache License, Version 2.0 (the \"License\");\n# you may not use this file except in compliance with the License.\n# You may obtain a copy of the License at\n#\n# https://www.apache.org/licenses/LICENSE-2.0\n#\n# Unless required by applicable law or agreed to in writing, software\n# distributed under the License is distributed on an \"AS IS\" BASIS,\n# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n# See the License for the specific language governing permissions and\n# limitations under the License.\n```"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "# RMI Sample Queries: Traffic Operations Manager (GA)\n",
+ "\n",
+ "\n",
+ " \n",
+ " \n",
+ "  Open in Colab\n",
+ " \n",
+ " | \n",
+ " \n",
+ " \n",
+ "  Open in Colab Enterprise\n",
+ " \n",
+ " | \n",
+ " \n",
+ " \n",
+ "  Open in BigQuery Studio\n",
+ " \n",
+ " | \n",
+ " \n",
+ " \n",
+ "  View on GitHub\n",
+ " \n",
+ " | \n",
+ "
\n",
+ "\n"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "This notebook contains sample queries for the **Traffic Operations Manager** persona, specifically for the **GA** stage."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "## 1. Setup"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "from google.colab import auth\n",
+ "import pandas as pd\n",
+ "\n",
+ "auth.authenticate_user()\n",
+ "\n",
+ "project_id = 'your-project-id' #@param {type:\"string\"}"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "### Writable Dataset\n",
+ "\n",
+ "Several queries in this notebook (e.g., those creating Materialized Views, Models, or Views) require a **writable dataset** within your own project. \n",
+ "**Note**: The source `boston_oct_2025_sample_data` dataset is a read-only subscription and cannot be used to store new resources.\n",
+ "\n",
+ "Run the cell below to create a new dataset (e.g., `rmi_analysis`) in your project if you haven't already.\n",
+ "\n",
+ "**Important**: When running queries that create new BigQuery resources (e.g., tables, views, models) outside of these `%%bigquery` magic cells, remember to manually prepend the job ID with `msqlfactory--` for proper tracking. For example: `bq query --job_id=msqlfactory--your-descriptive-job-name ...`"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "dataset_id = \"rmi_analysis\" #@param {type:\"string\"}\n",
+ "! bq --location=US mk --dataset {project_id}:{dataset_id}"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "## GA (General Availability)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "### tom1_peak_hour_delay.sql\n",
+ "**Business Question**: What is the average travel time delay during the morning peak (7-9 AM) for the top 10 most congested routes?\nUse Case: Identifies critical morning commute bottlenecks to inform operational decisions or public messaging.\nProduct Stage: GA\nEstimated Bytes Processed: ~151 MB (Requires JOIN with routes_status)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "%%bigquery --project {project_id} df_tom1_peak_hour_delay\n",
+ "/*\n",
+ " ANALYTICAL PATTERN: Temporal Filtering\n",
+ " This query uses EXTRACT(HOUR...) on a converted DATETIME to focus on local \n",
+ " Boston peak windows. It filters for active routes and applies a quality \n",
+ " check to ensure the geometry is a single ST_LineString.\n",
+ "*/\n",
+ "\n",
+ "WITH peak_hour_data AS (\n",
+ " SELECT\n",
+ " h.selected_route_id,\n",
+ " h.display_name,\n",
+ " -- delay_ratio > 1.0 indicates travel time is slower than free-flow (static)\n",
+ " SAFE_DIVIDE(h.duration_in_seconds, h.static_duration_in_seconds) AS delay_ratio\n",
+ " FROM `boston_oct_2025_sample_data.historical_travel_time` AS h\n",
+ " JOIN `boston_oct_2025_sample_data.routes_status` AS s USING (selected_route_id)\n",
+ " WHERE h.record_time BETWEEN '2025-10-01' AND '2025-11-01'\n",
+ " -- STATUS_RUNNING ensures we only analyze routes that are currently being monitored\n",
+ " AND s.status = 'STATUS_RUNNING'\n",
+ " -- AM Peak Window: 7:00 AM to 8:59 AM Local Time\n",
+ " AND EXTRACT(HOUR FROM DATETIME(h.record_time, 'America/New_York')) BETWEEN 7 AND 8\n",
+ " -- Geometry Integrity: Only process continuous, healthy paths\n",
+ " AND ST_GEOMETRYTYPE(h.route_geometry) = 'ST_LineString'\n",
+ ")\n",
+ "SELECT\n",
+ " display_name,\n",
+ " ROUND(AVG(delay_ratio), 3) AS avg_delay_ratio,\n",
+ " COUNT(*) AS sample_count\n",
+ "FROM peak_hour_data\n",
+ "GROUP BY 1\n",
+ "-- Threshold: Filter for routes that are at least marginally slower than free-flow\n",
+ "HAVING avg_delay_ratio > 1.0\n",
+ "ORDER BY avg_delay_ratio DESC\n",
+ "LIMIT 10;\n",
+ ""
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "### tom2_persistent_bottlenecks.sql\n",
+ "**Business Question**: Which road segments (SRIs) have been in a 'TRAFFIC_JAM' state most frequently?\nUse Case: Locates recurring local bottlenecks within routes, enabling targeted infrastructure investigation or signal timing adjustments.\nProduct Stage: GA\nEstimated Bytes Processed: ~250 MB (Requires UNNEST of speed_reading_intervals)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "%%bigquery --project {project_id} df_tom2_persistent_bottlenecks\n",
+ "/*\n",
+ " ANALYTICAL PATTERN: SRI Unnesting\n",
+ " RMI routes store segment-level traffic states (SRI) in a nested array. \n",
+ " This query 'explodes' that array using UNNEST to audit the frequency of \n",
+ " severe congestion across the entire network.\n",
+ "*/\n",
+ "\n",
+ "WITH exploded_sris AS (\n",
+ " SELECT\n",
+ " selected_route_id,\n",
+ " display_name,\n",
+ " -- 'speed' represents the RMI traffic state for that specific interval\n",
+ " sri.speed\n",
+ " FROM `boston_oct_2025_sample_data.recent_roads_data`,\n",
+ " UNNEST(speed_reading_intervals) AS sri\n",
+ " WHERE record_time BETWEEN '2025-10-01' AND '2025-11-01'\n",
+ ")\n",
+ "SELECT\n",
+ " selected_route_id,\n",
+ " display_name,\n",
+ " -- We count every occurrence of an interval being in a 'TRAFFIC_JAM'\n",
+ " COUNT(*) AS traffic_jam_count\n",
+ "FROM exploded_sris\n",
+ "-- Filter exclusively for the most severe RMI congestion state\n",
+ "WHERE speed = 'TRAFFIC_JAM'\n",
+ "GROUP BY 1, 2\n",
+ "ORDER BY traffic_jam_count DESC\n",
+ "LIMIT 10;\n",
+ ""
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "### tom3_operational_health.sql\n",
+ "**Business Question**: Which active routes are currently flagged with a 'LOW_ROAD_USAGE' validation error?\nUse Case: Monitors the reliability of data collection. Low usage flags indicate that insights for these routes may be based on fewer probes, requiring a review of route priority or placement.\nProduct Stage: GA\nEstimated Bytes Processed: < 1 MB"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "%%bigquery --project {project_id} df_tom3_operational_health\n",
+ "/*\n",
+ " ANALYTICAL PATTERN: Status Auditing\n",
+ " This query inspects the management plane table (routes_status) to identify \n",
+ " active routes that have quality warnings. This is critical for maintaining \n",
+ " trust in downstream traffic analytics.\n",
+ "*/\n",
+ "\n",
+ "SELECT\n",
+ " display_name,\n",
+ " selected_route_id,\n",
+ " status,\n",
+ " validation_error,\n",
+ " -- 'low_road_usage_start_time' is specifically populated when probe density drops below threshold\n",
+ " low_road_usage_start_time,\n",
+ " -- Time elapsed since the error was first detected (relative to sample end date)\n",
+ " DATETIME_DIFF(DATETIME('2025-11-01'), DATETIME(low_road_usage_start_time, 'UTC'), DAY) AS days_in_error_state\n",
+ "FROM `boston_oct_2025_sample_data.routes_status`\n",
+ "-- We only care about errors on routes that are supposed to be active (STATUS_RUNNING)\n",
+ "WHERE status = 'STATUS_RUNNING'\n",
+ " -- Filter specifically for the Low Road Usage warning\n",
+ " AND validation_error = 'VALIDATION_ERROR_LOW_ROAD_USAGE'\n",
+ "ORDER BY days_in_error_state DESC;\n",
+ ""
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "### tom4_data_latency.sql\n",
+ "**Business Question**: Are there any active routes that have stopped sending data near the end of the snapshot period?\nUse Case: Detects localized data gaps or \"silent\" routes in real-time, enabling operators to investigate issues before they impact reporting.\nProduct Stage: GA\nEstimated Bytes Processed: ~151 MB (Requires JOIN with routes_status)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "%%bigquery --project {project_id} df_tom4_data_latency\n",
+ "/*\n",
+ " ANALYTICAL PATTERN: Freshness Monitoring\n",
+ " By comparing the max record_time per route against the overall dataset \n",
+ " end-time, we can identify routes that have 'gone silent'.\n",
+ "*/\n",
+ "\n",
+ "WITH last_data_arrival AS (\n",
+ " SELECT\n",
+ " selected_route_id,\n",
+ " -- Get the latest record timestamp for every route in the dataset\n",
+ " MAX(record_time) AS last_arrival\n",
+ " FROM `boston_oct_2025_sample_data.historical_travel_time`\n",
+ " -- Focused partition scan for the full sample month\n",
+ " WHERE record_time BETWEEN '2025-10-01' AND '2025-11-01'\n",
+ " GROUP BY 1\n",
+ ")\n",
+ "SELECT\n",
+ " s.selected_route_id,\n",
+ " s.display_name,\n",
+ " l.last_arrival,\n",
+ " -- Measured relative to the very end of the sample dataset ('2025-11-01')\n",
+ " TIMESTAMP_DIFF(TIMESTAMP('2025-11-01 00:00:00'), l.last_arrival, MINUTE) as minutes_of_silence\n",
+ "FROM `boston_oct_2025_sample_data.routes_status` s\n",
+ "LEFT JOIN last_data_arrival l USING (selected_route_id)\n",
+ "-- Focus on routes that are supposed to be producing data\n",
+ "WHERE s.status = 'STATUS_RUNNING'\n",
+ " -- Threshold: Highlight routes that haven't sent a record in the last 2 minutes of the dataset\n",
+ " AND l.last_arrival < TIMESTAMP('2025-10-31 23:58:00')\n",
+ "ORDER BY minutes_of_silence DESC;\n",
+ ""
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "### tom5_significant_event_detection.sql\n",
+ "**Business Question**: Which routes experienced a travel time more than double their static baseline in the last 24 hours?\nUse Case: Automates the detection of extreme traffic events (accidents, severe weather, gridlock) that require immediate operational intervention.\nProduct Stage: GA\nEstimated Bytes Processed: ~151 MB (Requires JOIN with routes_status)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "%%bigquery --project {project_id} df_tom5_significant_event_detection\n",
+ "/*\n",
+ " ANALYTICAL PATTERN: Threshold-Based Alerting\n",
+ " This query identifies major traffic incidents by flagging records where the \n",
+ " actual travel time is at least 2x the free-flow baseline (static_duration). \n",
+ " It applies quality filters to ensure alerts are only triggered for single, \n",
+ " continuous paths.\n",
+ "*/\n",
+ "\n",
+ "SELECT\n",
+ " h.display_name,\n",
+ " h.selected_route_id,\n",
+ " h.record_time,\n",
+ " h.duration_in_seconds,\n",
+ " h.static_duration_in_seconds,\n",
+ " -- Delay ratio > 2.0 means travel time is 2x slower than ideal\n",
+ " ROUND(SAFE_DIVIDE(h.duration_in_seconds, h.static_duration_in_seconds), 2) AS delay_ratio\n",
+ "FROM `boston_oct_2025_sample_data.historical_travel_time` AS h\n",
+ "JOIN `boston_oct_2025_sample_data.routes_status` AS s USING(selected_route_id)\n",
+ "-- Filter for the final day of the sample dataset\n",
+ "WHERE h.record_time BETWEEN '2025-10-30' AND '2025-11-01'\n",
+ " -- Focus on active monitoring fleet\n",
+ " AND s.status = 'STATUS_RUNNING'\n",
+ " -- Filter for \"Significant\" events\n",
+ " AND SAFE_DIVIDE(h.duration_in_seconds, h.static_duration_in_seconds) > 2.0\n",
+ " -- Quality filter: Exclude non-continuous geometries\n",
+ " AND ST_GEOMETRYTYPE(h.route_geometry) = 'ST_LineString'\n",
+ "ORDER BY delay_ratio DESC;\n",
+ ""
+ ]
+ }
+ ],
+ "metadata": {
+ "colab": {
+ "provenance": []
+ },
+ "kernelspec": {
+ "display_name": "Python 3",
+ "name": "python3"
+ },
+ "language_info": {
+ "name": "python"
+ }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 0
+}
\ No newline at end of file
diff --git a/roads_management_insights/rmi_sample_queries/notebooks/Urban_Planner_Samples.ipynb b/roads_management_insights/rmi_sample_queries/notebooks/Urban_Planner_Samples.ipynb
new file mode 100644
index 0000000..e2449c4
--- /dev/null
+++ b/roads_management_insights/rmi_sample_queries/notebooks/Urban_Planner_Samples.ipynb
@@ -0,0 +1,350 @@
+{
+ "cells": [
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "```\n# Copyright 2026 Google LLC\n#\n# Licensed under the Apache License, Version 2.0 (the \"License\");\n# you may not use this file except in compliance with the License.\n# You may obtain a copy of the License at\n#\n# https://www.apache.org/licenses/LICENSE-2.0\n#\n# Unless required by applicable law or agreed to in writing, software\n# distributed under the License is distributed on an \"AS IS\" BASIS,\n# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n# See the License for the specific language governing permissions and\n# limitations under the License.\n```"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "# RMI Sample Queries: Urban Planner (GA)\n",
+ "\n",
+ "\n",
+ " \n",
+ " \n",
+ "  Open in Colab\n",
+ " \n",
+ " | \n",
+ " \n",
+ " \n",
+ "  Open in Colab Enterprise\n",
+ " \n",
+ " | \n",
+ " \n",
+ " \n",
+ "  Open in BigQuery Studio\n",
+ " \n",
+ " | \n",
+ " \n",
+ " \n",
+ "  View on GitHub\n",
+ " \n",
+ " | \n",
+ "
\n",
+ "\n"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "This notebook contains sample queries for the **Urban Planner** persona, specifically for the **GA** stage."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "## 1. Setup"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "from google.colab import auth\n",
+ "import pandas as pd\n",
+ "\n",
+ "auth.authenticate_user()\n",
+ "\n",
+ "project_id = 'your-project-id' #@param {type:\"string\"}"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "### Writable Dataset\n",
+ "\n",
+ "Several queries in this notebook (e.g., those creating Materialized Views, Models, or Views) require a **writable dataset** within your own project. \n",
+ "**Note**: The source `boston_oct_2025_sample_data` dataset is a read-only subscription and cannot be used to store new resources.\n",
+ "\n",
+ "Run the cell below to create a new dataset (e.g., `rmi_analysis`) in your project if you haven't already.\n",
+ "\n",
+ "**Important**: When running queries that create new BigQuery resources (e.g., tables, views, models) outside of these `%%bigquery` magic cells, remember to manually prepend the job ID with `msqlfactory--` for proper tracking. For example: `bq query --job_id=msqlfactory--your-descriptive-job-name ...`"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "dataset_id = \"rmi_analysis\" #@param {type:\"string\"}\n",
+ "! bq --location=US mk --dataset {project_id}:{dataset_id}"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "## GA (General Availability)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "### up1_corridor_trend.sql\n",
+ "**Business Question**: What has been the week-over-week trend in the average delay ratio for a specific corridor?\nUse Case: Enables long-term performance monitoring of critical transportation infrastructure, helping planners identify if congestion is worsening or improving over time.\nProduct Stage: GA\nEstimated Bytes Processed: ~150 MB"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "%%bigquery --project {project_id} df_up1_corridor_trend\n",
+ "/*\n",
+ " ANALYTICAL PATTERN: Weekly Trend Aggregation\n",
+ " This query truncates timestamps to the week level to smooth out day-to-day \n",
+ " fluctuations, focusing on the macro traffic behavior of a critical route.\n",
+ "*/\n",
+ "\n",
+ "WITH weekly_trends AS (\n",
+ " SELECT\n",
+ " selected_route_id,\n",
+ " -- Truncate to the start of the week for consistent aggregation\n",
+ " TIMESTAMP_TRUNC(record_time, WEEK) AS week,\n",
+ " -- Calculate average delay (Actual / Free-flow baseline)\n",
+ " AVG(SAFE_DIVIDE(duration_in_seconds, static_duration_in_seconds)) AS avg_delay_ratio\n",
+ " FROM `boston_oct_2025_sample_data.historical_travel_time`\n",
+ " -- Filter for a specific corridor of interest (e.g., Storrow Drive)\n",
+ " WHERE selected_route_id = 'route-4202493217'\n",
+ " AND record_time BETWEEN '2025-10-01' AND '2025-11-01'\n",
+ " GROUP BY 1, 2\n",
+ ")\n",
+ "SELECT\n",
+ " selected_route_id,\n",
+ " -- Format for readable year-week reporting\n",
+ " FORMAT_TIMESTAMP(\"%Y-%W\", week) AS year_week,\n",
+ " ROUND(avg_delay_ratio, 3) AS avg_delay_ratio\n",
+ "FROM weekly_trends\n",
+ "ORDER BY week;\n",
+ ""
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "### up2_impact_analysis.sql\n",
+ "**Business Question**: Has the average travel time on routes passing through a recent construction zone improved since the project's completion date?\nUse Case: Provides empirical evidence of infrastructure project success, validating whether road improvements (like new lanes or signals) actually reduced congestion.\nProduct Stage: GA\nEstimated Bytes Processed: ~150 MB"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "%%bigquery --project {project_id} df_up2_impact_analysis\n",
+ "/*\n",
+ " ANALYTICAL PATTERN: Spatial & Milestone Join\n",
+ " This query uses a DECLARE statement for the study area geometry to ensure \n",
+ " BigQuery treats the polygon as a constant, enabling efficient spatial \n",
+ " indexing during the ST_INTERSECTS join. It then segments the data based \n",
+ " on a chronological project milestone.\n",
+ "*/\n",
+ "\n",
+ "-- Study Area: Downtown Boston Intersection\n",
+ "DECLARE study_area GEOGRAPHY DEFAULT ST_GEOGFROMTEXT('POLYGON((-71.06 42.35, -71.05 42.35, -71.05 42.34, -71.06 42.34, -71.06 42.35))');\n",
+ "-- Project Milestone: Date when construction was completed\n",
+ "DECLARE completion_date DATE DEFAULT '2025-10-15';\n",
+ "\n",
+ "WITH impact_data AS (\n",
+ " SELECT\n",
+ " -- Split records into 'Before' and 'After' buckets\n",
+ " record_time >= completion_date AS is_after_completion,\n",
+ " SAFE_DIVIDE(duration_in_seconds, static_duration_in_seconds) AS delay_ratio\n",
+ " FROM `boston_oct_2025_sample_data.historical_travel_time`\n",
+ " -- Filter for routes that physically pass through the study zone\n",
+ " WHERE ST_INTERSECTS(route_geometry, study_area)\n",
+ " AND record_time BETWEEN '2025-10-01' AND '2025-11-01'\n",
+ ")\n",
+ "SELECT\n",
+ " is_after_completion,\n",
+ " ROUND(AVG(delay_ratio), 3) AS avg_delay_ratio,\n",
+ " COUNT(*) as sample_count\n",
+ "FROM impact_data\n",
+ "GROUP BY 1\n",
+ "ORDER BY 1;\n",
+ ""
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "### up3_monitoring_density.sql\n",
+ "**Business Question**: Which geographic areas show the highest concentration of RMI route monitoring?\nUse Case: Helps planners identify \"blind spots\" in their monitoring network or confirm that critical urban zones are sufficiently covered by RMI probes.\nProduct Stage: GA\nEstimated Bytes Processed: ~150 MB"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "%%bigquery --project {project_id} df_up3_monitoring_density\n",
+ "/*\n",
+ " ANALYTICAL PATTERN: Spatial Grid Aggregation\n",
+ " This query maps the RMI monitoring footprint by calculating route centroids \n",
+ " and grouping them into a ~1.1km grid (3 decimal places). This provides a \n",
+ " coarse-grained view of network density without high computational overhead.\n",
+ "*/\n",
+ "\n",
+ "WITH route_centroids AS (\n",
+ " SELECT \n",
+ " selected_route_id,\n",
+ " -- Use the centroid to represent the general location of the route polyline\n",
+ " ST_CENTROID(route_geometry) as centroid\n",
+ " FROM `boston_oct_2025_sample_data.historical_travel_time`\n",
+ " WHERE record_time BETWEEN '2025-10-01' AND '2025-11-01'\n",
+ ")\n",
+ "SELECT\n",
+ " -- Grid the coordinates to a precision of ~1.1km\n",
+ " ROUND(ST_Y(centroid), 3) AS lat_grid,\n",
+ " ROUND(ST_X(centroid), 3) AS lon_grid,\n",
+ " -- Count unique route definitions in this grid cell\n",
+ " COUNT(DISTINCT selected_route_id) AS unique_routes_monitored,\n",
+ " -- Count total traffic samples captured in this grid cell\n",
+ " COUNT(*) AS total_samples_collected\n",
+ "FROM route_centroids\n",
+ "GROUP BY 1, 2\n",
+ "ORDER BY unique_routes_monitored DESC\n",
+ "LIMIT 20;\n",
+ ""
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "### up4_weekend_vs_weekday.sql\n",
+ "**Business Question**: How does average travel time in the afternoon (2-5 PM) differ between weekdays and weekends?\nUse Case: Informs urban policy decisions like congestion pricing or off-peak transit scheduling by highlighting when road demand is most elastic.\nProduct Stage: GA\nEstimated Bytes Processed: ~150 MB"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "%%bigquery --project {project_id} df_up4_weekend_vs_weekday\n",
+ "/*\n",
+ " ANALYTICAL PATTERN: Day-Type Segmentation\n",
+ " This query uses EXTRACT(DAYOFWEEK...) to categorize records into binary \n",
+ " 'Weekday' or 'Weekend' buckets. It combines this with a peak-window filter \n",
+ " to provide a clean comparison of temporal demand shifts.\n",
+ "*/\n",
+ "\n",
+ "WITH afternoon_stats AS (\n",
+ " SELECT\n",
+ " -- Day segmentation: 1 = Sunday, 7 = Saturday\n",
+ " CASE \n",
+ " WHEN EXTRACT(DAYOFWEEK FROM DATETIME(record_time, 'America/New_York')) IN (1, 7) THEN 'Weekend'\n",
+ " ELSE 'Weekday'\n",
+ " END AS day_type,\n",
+ " duration_in_seconds,\n",
+ " static_duration_in_seconds\n",
+ " FROM `boston_oct_2025_sample_data.historical_travel_time`\n",
+ " WHERE record_time BETWEEN '2025-10-01' AND '2025-11-01'\n",
+ " -- Afternoon period: 2 PM to 5 PM Local Time (Boston)\n",
+ " AND EXTRACT(HOUR FROM DATETIME(record_time, 'America/New_York')) BETWEEN 14 AND 17\n",
+ ")\n",
+ "SELECT\n",
+ " day_type,\n",
+ " -- Calculate Average Delay Index (Actual / Ideal)\n",
+ " ROUND(AVG(SAFE_DIVIDE(duration_in_seconds, static_duration_in_seconds)), 3) AS avg_delay_ratio,\n",
+ " ROUND(AVG(duration_in_seconds), 2) AS avg_duration_seconds,\n",
+ " COUNT(*) as sample_count\n",
+ "FROM afternoon_stats\n",
+ "GROUP BY 1\n",
+ "ORDER BY avg_delay_ratio DESC;\n",
+ ""
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "### up5_geofenced_congestion.sql\n",
+ "**Business Question**: Within a specific downtown polygon, which routes are currently seeing travel times more than 50% above their static baseline?\nUse Case: Enables targeted traffic management and demand-response strategies within high-density zones or during special events.\nProduct Stage: GA\nEstimated Bytes Processed: ~150 MB"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "%%bigquery --project {project_id} df_up5_geofenced_congestion\n",
+ "/*\n",
+ " ANALYTICAL PATTERN: Spatial Geofencing\n",
+ " This query uses a DECLARE statement for the downtown polygon to ensure \n",
+ " BigQuery treats the study area as a constant, enabling efficient spatial \n",
+ " indexing during the ST_INTERSECTS join. It identifies routes that are \n",
+ " physically impacted by a specific urban zone.\n",
+ "*/\n",
+ "\n",
+ "-- Study Area: Downtown Boston Geofence\n",
+ "DECLARE downtown_zone GEOGRAPHY DEFAULT ST_GEOGFROMTEXT('POLYGON((-71.066 42.358, -71.052 42.358, -71.052 42.348, -71.066 42.348, -71.066 42.358))');\n",
+ "\n",
+ "WITH intersecting_routes AS (\n",
+ " SELECT\n",
+ " h.selected_route_id,\n",
+ " h.display_name,\n",
+ " SAFE_DIVIDE(h.duration_in_seconds, h.static_duration_in_seconds) AS delay_ratio\n",
+ " FROM `boston_oct_2025_sample_data.historical_travel_time` h\n",
+ " WHERE ST_INTERSECTS(h.route_geometry, downtown_zone)\n",
+ " AND h.record_time BETWEEN '2025-10-01' AND '2025-11-01'\n",
+ " -- Quality filter: Exclude non-continuous geometries\n",
+ " AND ST_GEOMETRYTYPE(h.route_geometry) = 'ST_LineString'\n",
+ ")\n",
+ "SELECT\n",
+ " selected_route_id,\n",
+ " display_name,\n",
+ " ROUND(AVG(delay_ratio), 3) AS avg_delay_ratio,\n",
+ " COUNT(*) as sample_count\n",
+ "FROM intersecting_routes\n",
+ "GROUP BY 1, 2\n",
+ "-- Threshold: Filter for routes that are at least 1.5x slower than free-flow\n",
+ "HAVING avg_delay_ratio > 1.5\n",
+ "ORDER BY avg_delay_ratio DESC;\n",
+ ""
+ ]
+ }
+ ],
+ "metadata": {
+ "colab": {
+ "provenance": []
+ },
+ "kernelspec": {
+ "display_name": "Python 3",
+ "name": "python3"
+ },
+ "language_info": {
+ "name": "python"
+ }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 0
+}
\ No newline at end of file
diff --git a/roads_management_insights/rmi_sample_queries/queries/bigquery_admin/bqa0_metadata_inventory.sql b/roads_management_insights/rmi_sample_queries/queries/bigquery_admin/bqa0_metadata_inventory.sql
new file mode 100644
index 0000000..052675c
--- /dev/null
+++ b/roads_management_insights/rmi_sample_queries/queries/bigquery_admin/bqa0_metadata_inventory.sql
@@ -0,0 +1,38 @@
+-- Copyright 2026 Google LLC
+--
+-- Licensed under the Apache License, Version 2.0 (the "License");
+-- you may not use this file except in compliance with the License.
+-- You may obtain a copy of the License at
+--
+-- https://www.apache.org/licenses/LICENSE-2.0
+--
+-- Unless required by applicable law or agreed to in writing, software
+-- distributed under the License is distributed on an "AS IS" BASIS,
+-- WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+-- See the License for the specific language governing permissions and
+-- limitations under the License.
+
+-- Q0: Metadata Inventory and Partition Overview
+-- Business Question: How can I quickly check the row count and storage size of all RMI tables using zero-cost metadata queries?
+-- Product Stage: GA
+-- Estimated Bytes Processed: N/A (Metadata Query)
+
+/*
+ This query utilizes INFORMATION_SCHEMA.PARTITIONS to provide a high-level
+ overview of table scale and data accumulation trends.
+ It processes 0 bytes because it scans system metadata rather than table data.
+*/
+
+SELECT
+ table_name,
+ CASE
+ WHEN partition_id IS NULL OR partition_id = '__UNPARTITIONED__' THEN 'UNPARTITIONED'
+ ELSE partition_id
+ END as partition_id,
+ total_rows,
+ ROUND(total_logical_bytes / POW(1024, 2), 2) as size_mb,
+ last_modified_time
+FROM `boston_oct_2025_sample_data.INFORMATION_SCHEMA.PARTITIONS`
+WHERE table_name IN ('historical_travel_time', 'recent_roads_data', 'routes_status')
+ AND partition_id != '__NULL__'
+ORDER BY table_name, partition_id DESC;
diff --git a/roads_management_insights/rmi_sample_queries/queries/bigquery_admin/bqa1_scan_volume.sql b/roads_management_insights/rmi_sample_queries/queries/bigquery_admin/bqa1_scan_volume.sql
new file mode 100644
index 0000000..1971f0a
--- /dev/null
+++ b/roads_management_insights/rmi_sample_queries/queries/bigquery_admin/bqa1_scan_volume.sql
@@ -0,0 +1,47 @@
+-- Copyright 2026 Google LLC
+--
+-- Licensed under the Apache License, Version 2.0 (the "License");
+-- you may not use this file except in compliance with the License.
+-- You may obtain a copy of the License at
+--
+-- https://www.apache.org/licenses/LICENSE-2.0
+--
+-- Unless required by applicable law or agreed to in writing, software
+-- distributed under the License is distributed on an "AS IS" BASIS,
+-- WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+-- See the License for the specific language governing permissions and
+-- limitations under the License.
+
+-- BigQuery Admin Query 1: Scan Volume Monitoring by User
+-- Business Question: Which users or service accounts are generating the highest scan volume against RMI tables this month?
+-- Use Case: Enables cost governance by identifying 'heavy' consumers of the RMI dataset. Administrators can use this data to justify budget reallocations or suggest query optimizations to specific teams.
+-- Product Stage: GA (Uses BigQuery INFORMATION_SCHEMA)
+-- Estimated Bytes Processed: N/A (Metadata Query)
+
+/*
+ AUDIT PATTERN: Job Metadata Analysis
+ This query scans the system-managed JOBS view. It calculates total data scanned
+ (billed bytes) for any query that mentions core RMI table names.
+
+ Note: Replace 'region-us' with the specific region where your dataset resides.
+*/
+
+SELECT
+ user_email,
+ -- Convert bytes to GB for readable billing analysis
+ SUM(total_bytes_billed) / POW(1024, 3) AS total_gb_billed,
+ COUNT(*) AS job_count,
+ -- Average scan size helps distinguish between 'many small queries' vs 'one massive scan'
+ AVG(total_bytes_billed) / POW(1024, 3) AS avg_gb_per_job
+FROM `region-us`.INFORMATION_SCHEMA.JOBS
+WHERE creation_time BETWEEN TIMESTAMP_TRUNC(CURRENT_TIMESTAMP(), MONTH) AND CURRENT_TIMESTAMP()
+ AND job_type = 'QUERY'
+ -- Heuristic filter: Look for queries mentioning RMI core tables
+ AND (
+ query LIKE '%historical_travel_time%'
+ OR query LIKE '%recent_roads_data%'
+ OR query LIKE '%routes_status%'
+ )
+GROUP BY 1
+ORDER BY total_gb_billed DESC
+LIMIT 10;
diff --git a/roads_management_insights/rmi_sample_queries/queries/bigquery_admin/bqa2_cost_attribution.sql b/roads_management_insights/rmi_sample_queries/queries/bigquery_admin/bqa2_cost_attribution.sql
new file mode 100644
index 0000000..7b28eaa
--- /dev/null
+++ b/roads_management_insights/rmi_sample_queries/queries/bigquery_admin/bqa2_cost_attribution.sql
@@ -0,0 +1,48 @@
+-- Copyright 2026 Google LLC
+--
+-- Licensed under the Apache License, Version 2.0 (the "License");
+-- you may not use this file except in compliance with the License.
+-- You may obtain a copy of the License at
+--
+-- https://www.apache.org/licenses/LICENSE-2.0
+--
+-- Unless required by applicable law or agreed to in writing, software
+-- distributed under the License is distributed on an "AS IS" BASIS,
+-- WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+-- See the License for the specific language governing permissions and
+-- limitations under the License.
+
+-- BigQuery Admin Query 2: Cost Attribution Audit (Missing Prefixes)
+-- Business Question: Identify any BigQuery jobs missing the mandatory 'rmisqlfactory_' prefix in their job IDs.
+-- Use Case: Ensures compliance with project governance standards. Consistent job ID prefixing is required for accurate cost attribution and auditing of RMI-related analysis.
+-- Product Stage: GA (Uses BigQuery INFORMATION_SCHEMA)
+-- Estimated Bytes Processed: N/A (Metadata Query)
+
+/*
+ NOTE: 'rmisqlfactory_' is the mandatory job ID prefix for this workspace.
+ This allows administrators to filter billing logs and correlate spend
+ with specific personas or tools.
+
+ SCOPE NOTE: Replace 'JOBS' with 'JOBS_BY_ORGANIZATION' if you have the
+ necessary permissions to audit spend across multiple projects.
+*/
+
+SELECT
+ job_id,
+ user_email,
+ creation_time,
+ total_bytes_billed,
+ -- Provide the query text to help identify the source of the non-compliant job
+ query
+FROM `region-us`.INFORMATION_SCHEMA.JOBS
+WHERE creation_time >= TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 7 DAY)
+ AND job_type = 'QUERY'
+ -- Filter for jobs that missed the mandatory prefix
+ AND NOT STARTS_WITH(job_id, 'rmisqlfactory_')
+ -- Only audit jobs that were targeting RMI core tables
+ AND (
+ query LIKE '%historical_travel_time%'
+ OR query LIKE '%recent_roads_data%'
+ OR query LIKE '%routes_status%'
+ )
+ORDER BY creation_time DESC;
diff --git a/roads_management_insights/rmi_sample_queries/queries/bigquery_admin/bqa3_derived_resources.sql b/roads_management_insights/rmi_sample_queries/queries/bigquery_admin/bqa3_derived_resources.sql
new file mode 100644
index 0000000..35d7a15
--- /dev/null
+++ b/roads_management_insights/rmi_sample_queries/queries/bigquery_admin/bqa3_derived_resources.sql
@@ -0,0 +1,57 @@
+-- Copyright 2026 Google LLC
+--
+-- Licensed under the Apache License, Version 2.0 (the "License");
+-- you may not use this file except in compliance with the License.
+-- You may obtain a copy of the License at
+--
+-- https://www.apache.org/licenses/LICENSE-2.0
+--
+-- Unless required by applicable law or agreed to in writing, software
+-- distributed under the License is distributed on an "AS IS" BASIS,
+-- WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+-- See the License for the specific language governing permissions and
+-- limitations under the License.
+
+-- BigQuery Admin Query 3: Identify Derived Tables and Views
+-- Business Question: What tables or views in my project are derived from the core RMI dataset?
+-- Use Case: Critical for lineage auditing and change management. Identifies 'shadow' analytical assets that may need to be updated or retired if the core RMI schema changes.
+-- Product Stage: GA (Uses BigQuery INFORMATION_SCHEMA)
+-- Estimated Bytes Processed: N/A (Metadata Query)
+
+/*
+ LINEAGE PATTERN: Metadata Dependency Mapping
+ This query scans the project metadata to find any VIEW definition that
+ references RMI core tables, as well as any clones or snapshots
+ targeting 'rmi' or 'road' named resources.
+*/
+
+-- Replace `your-project.your-dataset` with the location of your analytical workspace.
+
+SELECT
+ table_schema AS dataset_id,
+ table_name AS resource_name,
+ 'VIEW' AS type,
+ -- view_definition allows the admin to see the exact transformation logic
+ view_definition as lineage_detail
+FROM `your-project.your-dataset.INFORMATION_SCHEMA.VIEWS`
+WHERE (
+ view_definition LIKE '%historical_travel_time%'
+ OR view_definition LIKE '%recent_roads_data%'
+ OR view_definition LIKE '%routes_status%'
+ )
+
+UNION ALL
+
+-- Also identify Clones and Snapshots (cost-effective analytical patterns)
+SELECT
+ table_schema AS dataset_id,
+ table_name AS resource_name,
+ table_type AS type,
+ 'N/A (Check Table Metadata for Base Table Lineage)' AS lineage_detail
+FROM `your-project.your-dataset.INFORMATION_SCHEMA.TABLES`
+WHERE (table_name LIKE '%rmi%' OR table_name LIKE '%road%')
+ AND table_type IN ('BASE TABLE', 'CLONE', 'SNAPSHOT')
+ -- Exclude the raw source tables themselves
+ AND table_name NOT IN ('historical_travel_time', 'recent_roads_data', 'routes_status')
+
+ORDER BY dataset_id, resource_name;
diff --git a/roads_management_insights/rmi_sample_queries/queries/bigquery_admin/bqa4_query_patterns.sql b/roads_management_insights/rmi_sample_queries/queries/bigquery_admin/bqa4_query_patterns.sql
new file mode 100644
index 0000000..f7666b6
--- /dev/null
+++ b/roads_management_insights/rmi_sample_queries/queries/bigquery_admin/bqa4_query_patterns.sql
@@ -0,0 +1,81 @@
+-- Copyright 2026 Google LLC
+--
+-- Licensed under the Apache License, Version 2.0 (the "License");
+-- you may not use this file except in compliance with the License.
+-- You may obtain a copy of the License at
+--
+-- https://www.apache.org/licenses/LICENSE-2.0
+--
+-- Unless required by applicable law or agreed to in writing, software
+-- distributed under the License is distributed on an "AS IS" BASIS,
+-- WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+-- See the License for the specific language governing permissions and
+-- limitations under the License.
+
+-- BigQuery Admin Query 4: Detect Repeated Query Patterns for Optimization
+-- Business Question: What are the most frequent query patterns (joins, filters, JSON extractions) that could benefit from optimized downstream tables?
+-- Use Case: Enables pro-active optimization. If many users are extracting the same JSON attribute or joining the same tables daily, the Admin can create a materialized view or flattened table to improve performance and reduce costs.
+-- Product Stage: GA (Uses BigQuery INFORMATION_SCHEMA)
+-- Estimated Bytes Processed: N/A (Metadata Query)
+
+/*
+ OPTIMIZATION PATTERN: Pattern Mining
+ This query analyzes your recent job history to identify common access patterns.
+
+ Note: Replace 'region-us' with your actual BigQuery region.
+*/
+
+WITH job_history AS (
+ SELECT
+ query,
+ total_bytes_processed
+ FROM `region-us`.INFORMATION_SCHEMA.JOBS
+ WHERE creation_time >= TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 30 DAY)
+ AND job_type = 'QUERY'
+ AND statement_type = 'SELECT'
+ -- Heuristic: Focus on RMI-related queries
+ AND (
+ query LIKE '%historical_travel_time%'
+ OR query LIKE '%recent_roads_data%'
+ OR query LIKE '%routes_status%'
+ )
+),
+patterns AS (
+ SELECT
+ query,
+ -- Regex: Identify specific JSON attributes being extracted from 'route_attributes'
+ REGEXP_EXTRACT_ALL(query, r"JSON_EXTRACT_SCALAR\(route_attributes, '([^']+)'\)") as extracted_attributes,
+ -- Regex: Detect if the query performs SRI flattening (expensive unnest)
+ REGEXP_CONTAINS(query, r"UNNEST\(speed_reading_intervals\)") as uses_sri_unnest,
+ -- Regex: Detect common join patterns
+ REGEXP_CONTAINS(query, r"JOIN\s+`[^`]+historical_travel_time`") AND REGEXP_CONTAINS(query, r"JOIN\s+`[^`]+routes_status`") as joins_hist_and_status
+ FROM job_history
+)
+SELECT
+ 'Frequent Attribute Extraction' as pattern_type,
+ attr as detail,
+ COUNT(*) as frequency
+FROM patterns, UNNEST(extracted_attributes) as attr
+GROUP BY 1, 2
+
+UNION ALL
+
+SELECT
+ 'Heavy SRI Processing' as pattern_type,
+ 'Uses UNNEST(speed_reading_intervals)' as detail,
+ COUNT(*) as frequency
+FROM patterns
+WHERE uses_sri_unnest
+GROUP BY 1, 2
+
+UNION ALL
+
+SELECT
+ 'Common Table Joins' as pattern_type,
+ 'Joins historical_travel_time and routes_status' as detail,
+ COUNT(*) as frequency
+FROM patterns
+WHERE joins_hist_and_status
+GROUP BY 1, 2
+
+ORDER BY frequency DESC;
diff --git a/roads_management_insights/rmi_sample_queries/queries/bigquery_admin/bqa5_partition_pruning.sql b/roads_management_insights/rmi_sample_queries/queries/bigquery_admin/bqa5_partition_pruning.sql
new file mode 100644
index 0000000..adcb628
--- /dev/null
+++ b/roads_management_insights/rmi_sample_queries/queries/bigquery_admin/bqa5_partition_pruning.sql
@@ -0,0 +1,46 @@
+-- Copyright 2026 Google LLC
+--
+-- Licensed under the Apache License, Version 2.0 (the "License");
+-- you may not use this file except in compliance with the License.
+-- You may obtain a copy of the License at
+--
+-- https://www.apache.org/licenses/LICENSE-2.0
+--
+-- Unless required by applicable law or agreed to in writing, software
+-- distributed under the License is distributed on an "AS IS" BASIS,
+-- WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+-- See the License for the specific language governing permissions and
+-- limitations under the License.
+
+-- BigQuery Admin Query 5: Identify Queries with Inefficient Partition Pruning
+-- Business Question: Are there queries performing full table scans on 'historical_travel_time' instead of using the 'record_time' partition filter?
+-- Use Case: Detects 'expensive' behavior. Since RMI datasets are partitioned by day on 'record_time', any query that doesn't include a temporal filter will scan the entire history, significantly increasing costs.
+-- Product Stage: GA (Uses BigQuery INFORMATION_SCHEMA)
+-- Estimated Bytes Processed: N/A (Metadata Query)
+
+/*
+ AUDIT PATTERN: Pruning Heuristics
+ This query identifies large scans on the historical table.
+ It calculates if the 'total_bytes_processed' for a job is disproportionately
+ large compared to the typical size of a single daily partition.
+
+ Note: Replace 'region-us' with your actual BigQuery region.
+*/
+
+SELECT
+ job_id,
+ user_email,
+ query,
+ -- Convert bytes to GB for readable performance auditing
+ total_bytes_processed / POW(1024, 3) AS gb_processed,
+ creation_time
+FROM `region-us`.INFORMATION_SCHEMA.JOBS
+WHERE creation_time >= TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 7 DAY)
+ AND job_type = 'QUERY'
+ AND statement_type = 'SELECT'
+ AND query LIKE '%historical_travel_time%'
+ -- Heuristic: Trigger audit if scan volume exceeds 100 GB (adjustable baseline)
+ -- This suggests the user might have missed a partition pruning filter (record_time)
+ AND total_bytes_processed > 100 * POW(1024, 3)
+ORDER BY gb_processed DESC
+LIMIT 10;
diff --git a/roads_management_insights/rmi_sample_queries/queries/bigquery_admin/bqa6_data_complexity_audit.sql b/roads_management_insights/rmi_sample_queries/queries/bigquery_admin/bqa6_data_complexity_audit.sql
new file mode 100644
index 0000000..a647f3a
--- /dev/null
+++ b/roads_management_insights/rmi_sample_queries/queries/bigquery_admin/bqa6_data_complexity_audit.sql
@@ -0,0 +1,60 @@
+-- Copyright 2026 Google LLC
+--
+-- Licensed under the Apache License, Version 2.0 (the "License");
+-- you may not use this file except in compliance with the License.
+-- You may obtain a copy of the License at
+--
+-- https://www.apache.org/licenses/LICENSE-2.0
+--
+-- Unless required by applicable law or agreed to in writing, software
+-- distributed under the License is distributed on an "AS IS" BASIS,
+-- WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+-- See the License for the specific language governing permissions and
+-- limitations under the License.
+
+-- Q6: Data Characteristics and Complexity Audit
+-- Business Question: What is the average spatial complexity (vertex count) and metadata size (routeAttributes) of my actual records?
+-- Product Stage: GA
+-- Estimated Bytes Processed: ~450 MB (Full scan of geometry and attributes)
+
+/*
+ This query performs a deep audit of the data payload.
+ It is useful for understanding the impact of route precision and
+ custom attributes on storage and processing costs.
+*/
+
+-- Historical Spatial Complexity
+SELECT
+ 'historical_travel_time' as table_name,
+ COUNT(DISTINCT selected_route_id) as unique_routes,
+ AVG(BYTE_LENGTH(ST_ASBINARY(route_geometry))) as avg_geom_bytes,
+ AVG(ST_LENGTH(route_geometry) / 1000) as avg_route_length_km,
+ AVG(ST_NUMPOINTS(route_geometry)) as avg_num_points,
+ CAST(NULL AS FLOAT64) as avg_attr_bytes
+FROM `boston_oct_2025_sample_data.historical_travel_time`
+WHERE record_time BETWEEN '2025-10-01' AND '2025-11-01'
+
+UNION ALL
+
+-- Recent Spatial Complexity (Enriched)
+SELECT
+ 'recent_roads_data' as table_name,
+ COUNT(DISTINCT selected_route_id) as unique_routes,
+ AVG(BYTE_LENGTH(ST_ASBINARY(route_geometry))) as avg_geom_bytes,
+ AVG(ST_LENGTH(route_geometry) / 1000) as avg_route_length_km,
+ AVG(ST_NUMPOINTS(route_geometry)) as avg_num_points,
+ CAST(NULL AS FLOAT64) as avg_attr_bytes
+FROM `boston_oct_2025_sample_data.recent_roads_data`
+WHERE record_time BETWEEN '2025-10-01' AND '2025-11-01'
+
+UNION ALL
+
+-- Status Metadata Complexity
+SELECT
+ 'routes_status' as table_name,
+ COUNT(DISTINCT selected_route_id) as unique_routes,
+ CAST(NULL AS FLOAT64) as avg_geom_bytes,
+ CAST(NULL AS FLOAT64) as avg_route_length_km,
+ CAST(NULL AS FLOAT64) as avg_num_points,
+ AVG(BYTE_LENGTH(route_attributes)) as avg_attr_bytes
+FROM `boston_oct_2025_sample_data.routes_status`;
diff --git a/roads_management_insights/rmi_sample_queries/queries/data_engineer/de1_materialized_view.sql b/roads_management_insights/rmi_sample_queries/queries/data_engineer/de1_materialized_view.sql
new file mode 100644
index 0000000..b1a4798
--- /dev/null
+++ b/roads_management_insights/rmi_sample_queries/queries/data_engineer/de1_materialized_view.sql
@@ -0,0 +1,61 @@
+-- Copyright 2026 Google LLC
+--
+-- Licensed under the Apache License, Version 2.0 (the "License");
+-- you may not use this file except in compliance with the License.
+-- You may obtain a copy of the License at
+--
+-- https://www.apache.org/licenses/LICENSE-2.0
+--
+-- Unless required by applicable law or agreed to in writing, software
+-- distributed under the License is distributed on an "AS IS" BASIS,
+-- WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+-- See the License for the specific language governing permissions and
+-- limitations under the License.
+
+-- Data Engineer Query 1: Create Materialized Subset
+-- Business Question: Generate a query to create a 7-day materialized view of historical_travel_time for a specific corridor.
+-- Product Stage: GA
+-- Estimated Bytes Processed: ~150 MB
+-- Metadata: Uses ALTER statements to apply technical descriptions to all columns and the view itself.
+
+-- NOTE: The source dataset (e.g., `boston_oct_2025_sample_data`) is a read-only subscription from Analytics Hub.
+-- This materialized view MUST be created in a separate, writable dataset within your project.
+-- Replace `your-project.your-dataset` with your target location.
+
+CREATE MATERIALIZED VIEW IF NOT EXISTS `your-project.your-dataset.storrow_drive_view`
+CLUSTER BY selected_route_id AS
+SELECT
+ selected_route_id,
+ display_name,
+ record_time,
+ duration_in_seconds,
+ static_duration_in_seconds,
+ route_geometry
+FROM `boston_oct_2025_sample_data.historical_travel_time`
+WHERE record_time >= TIMESTAMP_SUB(TIMESTAMP('2025-10-31'), INTERVAL 7 DAY)
+ AND display_name LIKE '%Storrow-Drive%';
+
+-- Applying view-level metadata
+ALTER MATERIALIZED VIEW `your-project.your-dataset.storrow_drive_view`
+SET OPTIONS (
+ description="A 7-day rolling subset of RMI historical travel time data specifically for the Storrow Drive corridor."
+);
+
+-- Applying column-level metadata descriptions
+ALTER COLUMN selected_route_id SET OPTIONS(description="Unique identifier for the SelectedRoute resource.")
+ON `your-project.your-dataset.storrow_drive_view`;
+
+ALTER COLUMN display_name SET OPTIONS(description="User-provided descriptive name for the route.")
+ON `your-project.your-dataset.storrow_drive_view`;
+
+ALTER COLUMN record_time SET OPTIONS(description="The UTC timestamp representing when the route data was computed.")
+ON `your-project.your-dataset.storrow_drive_view`;
+
+ALTER COLUMN duration_in_seconds SET OPTIONS(description="The traffic-aware duration of the route in seconds.")
+ON `your-project.your-dataset.storrow_drive_view`;
+
+ALTER COLUMN static_duration_in_seconds SET OPTIONS(description="The traffic-unaware (static) duration of the route in seconds.")
+ON `your-project.your-dataset.storrow_drive_view`;
+
+ALTER COLUMN route_geometry SET OPTIONS(description="The traffic-aware polyline geometry of the route as a GEOGRAPHY object.")
+ON `your-project.your-dataset.storrow_drive_view`;
diff --git a/roads_management_insights/rmi_sample_queries/queries/data_engineer/de2_data_cleaning.sql b/roads_management_insights/rmi_sample_queries/queries/data_engineer/de2_data_cleaning.sql
new file mode 100644
index 0000000..02b1bb6
--- /dev/null
+++ b/roads_management_insights/rmi_sample_queries/queries/data_engineer/de2_data_cleaning.sql
@@ -0,0 +1,46 @@
+-- Copyright 2026 Google LLC
+--
+-- Licensed under the Apache License, Version 2.0 (the "License");
+-- you may not use this file except in compliance with the License.
+-- You may obtain a copy of the License at
+--
+-- https://www.apache.org/licenses/LICENSE-2.0
+--
+-- Unless required by applicable law or agreed to in writing, software
+-- distributed under the License is distributed on an "AS IS" BASIS,
+-- WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+-- See the License for the specific language governing permissions and
+-- limitations under the License.
+
+-- Data Engineer Query 2: Data Cleaning Transformation
+-- Business Question: Write a query that produces a "cleaned" version of the routes_status table, correctly casting the route_length.
+-- Product Stage: GA
+-- Estimated Bytes Processed: < 1 MB
+-- Metadata: Provides descriptions for transformed fields and the view itself.
+
+/*
+ PRE-REQUISITE: This query utilizes the custom routeAttribute 'route_length'
+ (intended physical length in meters), which has been pre-configured for
+ all routes in this sample dataset.
+*/
+
+CREATE OR REPLACE VIEW `your-project.your-dataset.routes_status_cleaned`
+(
+ selected_route_id OPTIONS(description="Unique identifier for the SelectedRoute resource."),
+ display_name OPTIONS(description="User-provided descriptive name for the route."),
+ status OPTIONS(description="Operational state (e.g., STATUS_RUNNING)."),
+ validation_error OPTIONS(description="Reason for failure if status is INVALID."),
+ route_length_meters OPTIONS(description="The pre-computed intended route length in meters, cast from the custom 'route_length' routeAttribute.")
+)
+OPTIONS(
+ description="A cleaned view of SelectedRoutes status, with the custom route_length attribute promoted to a typed column."
+)
+AS
+SELECT
+ selected_route_id,
+ display_name,
+ status,
+ validation_error,
+ CAST(JSON_VALUE(route_attributes, '$.route_length') AS FLOAT64) AS route_length_meters
+FROM `boston_oct_2025_sample_data.routes_status`
+WHERE status != 'STATUS_INVALID';
diff --git a/roads_management_insights/rmi_sample_queries/queries/data_engineer/de3_sri_flattening.sql b/roads_management_insights/rmi_sample_queries/queries/data_engineer/de3_sri_flattening.sql
new file mode 100644
index 0000000..10db599
--- /dev/null
+++ b/roads_management_insights/rmi_sample_queries/queries/data_engineer/de3_sri_flattening.sql
@@ -0,0 +1,120 @@
+-- Copyright 2026 Google LLC
+--
+-- Licensed under the Apache License, Version 2.0 (the "License");
+-- you may not use this file except in compliance with the License.
+-- You may obtain a copy of the License at
+--
+-- https://www.apache.org/licenses/LICENSE-2.0
+--
+-- Unless required by applicable law or agreed to in writing, software
+-- distributed under the License is distributed on an "AS IS" BASIS,
+-- WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+-- See the License for the specific language governing permissions and
+-- limitations under the License.
+
+-- Data Engineer Query 3: SRI Flattening (Scripted Version)
+-- Business Question: Create an optimized script to transform the latest 30 minutes of nested SRI data into a flattened format with spatial progress metrics and quality filters.
+-- Product Stage: GA
+-- Estimated Bytes Processed: ~10 MB (Optimized via Scripting and Static Partition Pruning)
+
+/*
+ BIGQUERY OPTIMIZATION PATTERN: Static vs. Dynamic Partition Pruning
+
+ This query uses BigQuery Scripting (DECLARE/SET) to force "Static Pruning".
+
+ 1. Static Pruning (This Pattern): By resolving 'target_time' into a variable BEFORE
+ the main SELECT, BigQuery treats it as a constant. This allows the optimizer
+ to immediately discard irrelevant partitions.
+
+ 2. Geometry Integrity Check: To ensure high-quality analysis, this query:
+ a) Calculates 'length_deviation_ratio' against pre-computed attributes.
+ b) Excludes 'MultiLineString' geometries to ensure we only process single,
+ continuous paths (ST_LineString).
+
+ 3. Noise Reduction: Final results exclude 'NORMAL' speed states and filter out
+ extremely short intervals (< 5 meters) that often represent GPS noise.
+*/
+
+-- Step 1: Define the static anchor date to narrow down partitions
+DECLARE anchor_date DATE DEFAULT '2025-10-31';
+
+-- Step 2: Find the exact latest timestamp and define the 30-minute window
+DECLARE latest_timestamp TIMESTAMP;
+SET latest_timestamp = (
+ SELECT MAX(record_time)
+ FROM `boston_oct_2025_sample_data.recent_roads_data`
+ WHERE record_time >= TIMESTAMP(anchor_date)
+);
+
+-- Step 3: Execute the flattening logic for the latest 30-minute window
+WITH base_intervals AS (
+ SELECT
+ r.selected_route_id,
+ r.record_time,
+ segment_offset as interval_index,
+ sri.speed as interval_speed_state,
+ -- Reconstruct the interval polyline from the array of interval points
+ ST_MAKELINE(sri.interval_coordinates) as interval_geometry,
+ -- Core metrics for integrity check
+ ST_LENGTH(r.route_geometry) as actual_route_length_meters,
+ CAST(JSON_VALUE(s.route_attributes, '$.route_length') AS FLOAT64) as intended_route_length_meters
+ FROM `boston_oct_2025_sample_data.recent_roads_data` r
+ JOIN `boston_oct_2025_sample_data.routes_status` s USING(selected_route_id),
+ UNNEST(speed_reading_intervals) AS sri WITH OFFSET AS segment_offset
+ WHERE r.record_time >= TIMESTAMP(anchor_date)
+ -- Capture only records from the last 30 minutes
+ AND r.record_time > TIMESTAMP_SUB(latest_timestamp, INTERVAL 30 MINUTE)
+ -- Quality filter: Only process single, continuous paths
+ AND ST_GEOMETRYTYPE(r.route_geometry) = 'ST_LineString'
+),
+quality_filtered_intervals AS (
+ SELECT
+ *,
+ -- Deviation between intended and actual geometry length
+ SAFE_DIVIDE(ABS(actual_route_length_meters - intended_route_length_meters), intended_route_length_meters) as length_deviation_ratio
+ FROM base_intervals
+ -- Filter for high-integrity geometries (e.g., < 5% deviation)
+ WHERE SAFE_DIVIDE(ABS(actual_route_length_meters - intended_route_length_meters), intended_route_length_meters) < 0.05
+),
+metrics_calculation AS (
+ SELECT
+ *,
+ ST_LENGTH(interval_geometry) as interval_length_meters,
+ -- Roll-up sum of interval lengths to find cumulative distance from origin
+ SUM(ST_LENGTH(interval_geometry)) OVER (
+ PARTITION BY selected_route_id, record_time
+ ORDER BY interval_index
+ ) as cumulative_length_meters,
+ -- Count total intervals in the route for context
+ COUNT(*) OVER (
+ PARTITION BY selected_route_id, record_time
+ ) as total_intervals
+ FROM quality_filtered_intervals
+),
+position_ratios AS (
+ SELECT
+ *,
+ -- The end of the previous interval is the start of the current interval
+ COALESCE(LAG(cumulative_length_meters) OVER (
+ PARTITION BY selected_route_id, record_time
+ ORDER BY interval_index
+ ), 0.0) as start_length_meters
+ FROM metrics_calculation
+)
+SELECT
+ selected_route_id,
+ record_time,
+ interval_index,
+ total_intervals,
+ interval_speed_state,
+ interval_length_meters,
+ -- Rounded relative positions (0.000 to 1.000) within the trip
+ ROUND(SAFE_DIVIDE(start_length_meters, actual_route_length_meters), 3) as start_position_ratio,
+ ROUND(SAFE_DIVIDE(cumulative_length_meters, actual_route_length_meters), 3) as end_position_ratio,
+ length_deviation_ratio,
+ interval_geometry
+FROM position_ratios
+-- Filter for congested intervals and exclude noise (short intervals)
+WHERE interval_speed_state != 'NORMAL'
+ AND interval_length_meters >= 5
+ORDER BY selected_route_id, record_time, interval_index;
diff --git a/roads_management_insights/rmi_sample_queries/queries/data_engineer/de4_attribute_extraction.sql b/roads_management_insights/rmi_sample_queries/queries/data_engineer/de4_attribute_extraction.sql
new file mode 100644
index 0000000..4fe6056
--- /dev/null
+++ b/roads_management_insights/rmi_sample_queries/queries/data_engineer/de4_attribute_extraction.sql
@@ -0,0 +1,42 @@
+-- Copyright 2026 Google LLC
+--
+-- Licensed under the Apache License, Version 2.0 (the "License");
+-- you may not use this file except in compliance with the License.
+-- You may obtain a copy of the License at
+--
+-- https://www.apache.org/licenses/LICENSE-2.0
+--
+-- Unless required by applicable law or agreed to in writing, software
+-- distributed under the License is distributed on an "AS IS" BASIS,
+-- WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+-- See the License for the specific language governing permissions and
+-- limitations under the License.
+
+-- Data Engineer Query 4: Attribute Extraction
+-- Business Question: Write a query that pivots the JSON route_attributes into distinct columns.
+-- Product Stage: GA
+-- Estimated Bytes Processed: < 1 MB
+-- Metadata: Enriches pivoted columns with business definitions.
+
+CREATE OR REPLACE VIEW `your-project.your-dataset.routes_enriched_attributes`
+(
+ selected_route_id OPTIONS(description="Unique identifier for the SelectedRoute resource."),
+ region OPTIONS(description="The geographical business region extracted from routeAttributes."),
+ tier OPTIONS(description="The service tier (e.g. priority, standard) extracted from routeAttributes."),
+ priority OPTIONS(description="The operational priority level assigned during registration."),
+ route_length_meters OPTIONS(description="The intended physical length of the route in meters, cast to FLOAT64 from routeAttributes.")
+)
+OPTIONS(
+ description="A denormalized view of SelectedRoute metadata, promoting custom JSON attributes into typed top-level columns."
+)
+AS
+SELECT
+ selected_route_id,
+ JSON_EXTRACT_SCALAR(route_attributes, '$.region') as region,
+ JSON_EXTRACT_SCALAR(route_attributes, '$.tier') as tier,
+ JSON_EXTRACT_SCALAR(route_attributes, '$.priority') as priority,
+ -- route_attributes values are always strings. Casting to FLOAT64 for numerical analysis.
+ CAST(JSON_EXTRACT_SCALAR(route_attributes, '$.route_length') AS FLOAT64) as route_length_meters
+FROM `boston_oct_2025_sample_data.routes_status`
+-- Example: Filtering by priority attribute
+-- WHERE JSON_EXTRACT_SCALAR(route_attributes, '$.priority') = 'high';
diff --git a/roads_management_insights/rmi_sample_queries/queries/data_engineer/de5_freshness_audit.sql b/roads_management_insights/rmi_sample_queries/queries/data_engineer/de5_freshness_audit.sql
new file mode 100644
index 0000000..2fbc182
--- /dev/null
+++ b/roads_management_insights/rmi_sample_queries/queries/data_engineer/de5_freshness_audit.sql
@@ -0,0 +1,57 @@
+-- Copyright 2026 Google LLC
+--
+-- Licensed under the Apache License, Version 2.0 (the "License");
+-- you may not use this file except in compliance with the License.
+-- You may obtain a copy of the License at
+--
+-- https://www.apache.org/licenses/LICENSE-2.0
+--
+-- Unless required by applicable law or agreed to in writing, software
+-- distributed under the License is distributed on an "AS IS" BASIS,
+-- WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+-- See the License for the specific language governing permissions and
+-- limitations under the License.
+
+-- Data Engineer Query 5: Data Freshness Audit
+-- Business Question: Which active routes have stopped receiving updates, indicating potential data gaps?
+-- Product Stage: GA
+-- Estimated Bytes Processed: ~151 MB
+-- Metadata: Provides descriptions for the audit results.
+
+/*
+ AUDIT GOAL: Identify routes that are 'STATUS_RUNNING' but have no recent
+ records in historical_travel_time. This helps detect routes with
+ insufficient traffic or pipeline latency issues.
+*/
+
+CREATE OR REPLACE VIEW `your-project.your-dataset.route_freshness_audit`
+(
+ selected_route_id OPTIONS(description="Unique identifier for the SelectedRoute resource."),
+ display_name OPTIONS(description="Human-readable name of the route."),
+ last_updated OPTIONS(description="The timestamp of the most recent record found in historical_travel_time."),
+ hours_since_last_update OPTIONS(description="The age of the data in hours relative to the audit timestamp.")
+)
+OPTIONS(
+ description="Operational audit view to identify active routes with missing or stale travel time data."
+)
+AS
+WITH freshness AS (
+ SELECT
+ selected_route_id,
+ MAX(record_time) as last_updated
+ FROM `boston_oct_2025_sample_data.historical_travel_time`
+ -- Scans the full sample month to find the latest record for every route
+ WHERE record_time BETWEEN '2025-10-01' AND '2025-11-01'
+ GROUP BY 1
+)
+SELECT
+ s.selected_route_id,
+ s.display_name,
+ f.last_updated,
+ -- Using '2025-11-01' as the reference 'Now' for this static sample dataset
+ TIMESTAMP_DIFF(TIMESTAMP('2025-11-01'), f.last_updated, HOUR) AS hours_since_last_update
+FROM `boston_oct_2025_sample_data.routes_status` s
+LEFT JOIN freshness f USING(selected_route_id)
+-- Focus on routes that SHOULD be providing data
+WHERE s.status = 'STATUS_RUNNING'
+ORDER BY hours_since_last_update DESC;
diff --git a/roads_management_insights/rmi_sample_queries/queries/data_engineer/de7_routes_status_snapshot.sql b/roads_management_insights/rmi_sample_queries/queries/data_engineer/de7_routes_status_snapshot.sql
new file mode 100644
index 0000000..8ef40c9
--- /dev/null
+++ b/roads_management_insights/rmi_sample_queries/queries/data_engineer/de7_routes_status_snapshot.sql
@@ -0,0 +1,61 @@
+-- Copyright 2026 Google LLC
+--
+-- Licensed under the Apache License, Version 2.0 (the "License");
+-- you may not use this file except in compliance with the License.
+-- You may obtain a copy of the License at
+--
+-- https://www.apache.org/licenses/LICENSE-2.0
+--
+-- Unless required by applicable law or agreed to in writing, software
+-- distributed under the License is distributed on an "AS IS" BASIS,
+-- WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+-- See the License for the specific language governing permissions and
+-- limitations under the License.
+
+-- Q7: Automate Daily Snapshot of Routes Status (Scheduled Query)
+-- Business Question: How can I automate the historical tracking of my SelectedRoutes' status changes?
+-- Product Stage: GA
+-- Estimated Bytes Processed: < 1 MB
+-- Metadata: Inherits column descriptions from routes_status and adds snapshot metadata.
+
+/*
+ AUTOMATION EXAMPLE:
+ To schedule this snapshot daily at 2 AM UTC using the bq CLI, run:
+
+ bq mk \
+ --transfer_config \
+ --project_id="your-project-id" \
+ --data_source=scheduled_query \
+ --display_name="Daily RMI Routes Status Snapshot" \
+ --target_dataset="your_dataset" \
+ --schedule="every 24 hours" \
+ --params='{
+ "query":"INSERT INTO `your-project.your-dataset.routes_status_history` SELECT CURRENT_TIMESTAMP() as snapshot_time, * FROM `boston_oct_2025_sample_data.routes_status`"
+ }'
+*/
+
+-- STEP 1: Initialize the partitioned history table with enriched metadata
+CREATE TABLE IF NOT EXISTS `your-project.your-dataset.routes_status_history` (
+ snapshot_time TIMESTAMP OPTIONS(description="The UTC timestamp when this snapshot was captured."),
+ selected_route_id STRING OPTIONS(description="Unique identifier for the SelectedRoute resource."),
+ display_name STRING OPTIONS(description="User-provided descriptive name for the route."),
+ status STRING OPTIONS(description="Current operational state (e.g., STATUS_RUNNING, STATUS_INVALID)."),
+ validation_error STRING OPTIONS(description="Detailed reason if the route failed validation."),
+ low_road_usage_start_time TIMESTAMP OPTIONS(description="Timestamp when low road usage was first detected."),
+ route_attributes STRING OPTIONS(description="JSON string of custom business metadata.")
+)
+PARTITION BY DATE(snapshot_time)
+CLUSTER BY selected_route_id;
+
+-- STEP 2: The Periodic Append Logic (Manually executable version)
+-- This statement appends the current state of all routes into the history table.
+INSERT INTO `your-project.your-dataset.routes_status_history`
+SELECT
+ CURRENT_TIMESTAMP() as snapshot_time,
+ selected_route_id,
+ display_name,
+ status,
+ validation_error,
+ low_road_usage_start_time,
+ route_attributes
+FROM `boston_oct_2025_sample_data.routes_status`;
diff --git a/roads_management_insights/rmi_sample_queries/queries/data_scientist/ds1_outlier_detection.sql b/roads_management_insights/rmi_sample_queries/queries/data_scientist/ds1_outlier_detection.sql
new file mode 100644
index 0000000..7f99659
--- /dev/null
+++ b/roads_management_insights/rmi_sample_queries/queries/data_scientist/ds1_outlier_detection.sql
@@ -0,0 +1,71 @@
+-- Copyright 2026 Google LLC
+--
+-- Licensed under the Apache License, Version 2.0 (the "License");
+-- you may not use this file except in compliance with the License.
+-- You may obtain a copy of the License at
+--
+-- https://www.apache.org/licenses/LICENSE-2.0
+--
+-- Unless required by applicable law or agreed to in writing, software
+-- distributed under the License is distributed on an "AS IS" BASIS,
+-- WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+-- See the License for the specific language governing permissions and
+-- limitations under the License.
+
+-- Data Scientist Query 1: Outlier Detection (Interquartile Range)
+-- Business Question: Which travel time records for a specific route are statistical outliers?
+-- Use Case: Automatically flags anomalous data points that could indicate extreme traffic events or potential data collection issues.
+-- Product Stage: GA
+-- Estimated Bytes Processed: ~151 MB (Requires JOIN with routes_status)
+
+/*
+ QUALITY FILTERS:
+ 1. continuous_path: Excludes records where the geometry is not a single ST_LineString.
+ 2. length_integrity: Excludes records where actual physical length deviates by > 5%
+ from the intended 'route_length' attribute.
+*/
+
+WITH quality_filtered_history AS (
+ SELECT
+ h.selected_route_id,
+ h.record_time,
+ h.duration_in_seconds,
+ ST_LENGTH(h.route_geometry) as actual_length,
+ CAST(JSON_VALUE(s.route_attributes, '$.route_length') AS FLOAT64) as intended_length
+ FROM `boston_oct_2025_sample_data.historical_travel_time` h
+ JOIN `boston_oct_2025_sample_data.routes_status` s USING(selected_route_id)
+ WHERE h.selected_route_id = 'route-4202493217'
+ AND h.record_time BETWEEN '2025-10-01' AND '2025-11-01'
+ -- Quality filter: Only process single, continuous paths
+ AND ST_GEOMETRYTYPE(h.route_geometry) = 'ST_LineString'
+ -- Quality filter: Length deviation check (< 5%)
+ AND SAFE_DIVIDE(ABS(ST_LENGTH(h.route_geometry) - CAST(JSON_VALUE(s.route_attributes, '$.route_length') AS FLOAT64)), CAST(JSON_VALUE(s.route_attributes, '$.route_length') AS FLOAT64)) < 0.05
+),
+stats AS (
+ SELECT
+ APPROX_QUANTILES(duration_in_seconds, 100)[OFFSET(25)] AS q1,
+ APPROX_QUANTILES(duration_in_seconds, 100)[OFFSET(75)] AS q3
+ FROM quality_filtered_history
+),
+outlier_thresholds AS (
+ SELECT
+ q1,
+ q3,
+ (q3 - q1) AS iqr,
+ q1 - (1.5 * (q3 - q1)) AS lower_bound,
+ q3 + (1.5 * (q3 - q1)) AS upper_bound
+ FROM stats
+)
+SELECT
+ h.record_time,
+ h.duration_in_seconds,
+ t.lower_bound,
+ t.upper_bound,
+ CASE
+ WHEN h.duration_in_seconds > t.upper_bound THEN 'High_Outlier'
+ WHEN h.duration_in_seconds < t.lower_bound THEN 'Low_Outlier'
+ END as outlier_type
+FROM quality_filtered_history h, outlier_thresholds t
+-- Filter for records outside the calculated IQR bounds
+WHERE (h.duration_in_seconds > t.upper_bound OR h.duration_in_seconds < t.lower_bound)
+ORDER BY h.record_time DESC;
diff --git a/roads_management_insights/rmi_sample_queries/queries/data_scientist/ds2_similarity_clustering.sql b/roads_management_insights/rmi_sample_queries/queries/data_scientist/ds2_similarity_clustering.sql
new file mode 100644
index 0000000..c30879c
--- /dev/null
+++ b/roads_management_insights/rmi_sample_queries/queries/data_scientist/ds2_similarity_clustering.sql
@@ -0,0 +1,70 @@
+-- Copyright 2026 Google LLC
+--
+-- Licensed under the Apache License, Version 2.0 (the "License");
+-- you may not use this file except in compliance with the License.
+-- You may obtain a copy of the License at
+--
+-- https://www.apache.org/licenses/LICENSE-2.0
+--
+-- Unless required by applicable law or agreed to in writing, software
+-- distributed under the License is distributed on an "AS IS" BASIS,
+-- WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+-- See the License for the specific language governing permissions and
+-- limitations under the License.
+
+-- Data Scientist Query 2: Route Similarity Clustering (Feature-Based)
+-- Business Question: Which routes exhibit similar traffic patterns based on their average peak-hour delay ratios?
+-- Use Case: Grouping routes by behavioral similarity allows planners to apply similar mitigation strategies to entire clusters of road segments rather than analyzing each route individually.
+-- Product Stage: GA
+-- Estimated Bytes Processed: ~150 MB
+
+/*
+ INTERPRETATION GUIDE:
+ Routes assigned to the same 'cluster_id' share a similar diurnal traffic profile
+ (the relationship between AM, Midday, and PM delays).
+
+ Example Interpretation:
+ - Cluster 1: Commuter Heavy (High AM/PM delay, low Midday).
+ - Cluster 2: Consistently Efficient (Delay ratio near 1.0 all day).
+ - Cluster 3: Midday Bottleneck (High Midday delay, typical AM/PM).
+*/
+
+-- Step 1: Create the K-Means model.
+-- NOTE: The source dataset (e.g., `boston_oct_2025_sample_data`) is a read-only subscription.
+-- This model MUST be created in a separate, writable dataset within your project.
+-- Replace `your-project.your-dataset` with your target location.
+
+CREATE OR REPLACE MODEL `your-project.your-dataset.route_clusters`
+OPTIONS(model_type='kmeans', num_clusters=5) AS
+SELECT
+ -- K-Means works with numerical features. We will use the delay ratios as features.
+ COALESCE(AVG(CASE WHEN EXTRACT(HOUR FROM DATETIME(record_time, 'America/New_York')) BETWEEN 7 AND 9 THEN SAFE_DIVIDE(duration_in_seconds, static_duration_in_seconds) END), 1.0) AS avg_am_delay,
+ COALESCE(AVG(CASE WHEN EXTRACT(HOUR FROM DATETIME(record_time, 'America/New_York')) BETWEEN 12 AND 14 THEN SAFE_DIVIDE(duration_in_seconds, static_duration_in_seconds) END), 1.0) AS avg_midday_delay,
+ COALESCE(AVG(CASE WHEN EXTRACT(HOUR FROM DATETIME(record_time, 'America/New_York')) BETWEEN 16 AND 18 THEN SAFE_DIVIDE(duration_in_seconds, static_duration_in_seconds) END), 1.0) AS avg_pm_delay
+FROM `boston_oct_2025_sample_data.historical_travel_time`
+WHERE record_time BETWEEN '2025-10-01' AND '2025-11-01'
+GROUP BY selected_route_id;
+
+-- Step 2: Predict the cluster for each route using the trained model.
+WITH route_features AS (
+ SELECT
+ selected_route_id,
+ display_name,
+ COALESCE(AVG(CASE WHEN EXTRACT(HOUR FROM DATETIME(record_time, 'America/New_York')) BETWEEN 7 AND 9 THEN SAFE_DIVIDE(duration_in_seconds, static_duration_in_seconds) END), 1.0) AS avg_am_delay,
+ COALESCE(AVG(CASE WHEN EXTRACT(HOUR FROM DATETIME(record_time, 'America/New_York')) BETWEEN 12 AND 14 THEN SAFE_DIVIDE(duration_in_seconds, static_duration_in_seconds) END), 1.0) AS avg_midday_delay,
+ COALESCE(AVG(CASE WHEN EXTRACT(HOUR FROM DATETIME(record_time, 'America/New_York')) BETWEEN 16 AND 18 THEN SAFE_DIVIDE(duration_in_seconds, static_duration_in_seconds) END), 1.0) AS avg_pm_delay
+ FROM `boston_oct_2025_sample_data.historical_travel_time`
+ WHERE record_time BETWEEN '2025-10-01' AND '2025-11-01'
+ GROUP BY 1, 2
+)
+SELECT
+ selected_route_id,
+ display_name,
+ CENTROID_ID AS cluster_id,
+ ROUND(avg_am_delay, 2) as am_ratio,
+ ROUND(avg_midday_delay, 2) as midday_ratio,
+ ROUND(avg_pm_delay, 2) as pm_ratio
+FROM ML.PREDICT(MODEL `your-project.your-dataset.route_clusters`,
+ (SELECT * FROM route_features)
+)
+ORDER BY cluster_id, selected_route_id;
diff --git a/roads_management_insights/rmi_sample_queries/queries/data_scientist/ds3_feature_engineering.sql b/roads_management_insights/rmi_sample_queries/queries/data_scientist/ds3_feature_engineering.sql
new file mode 100644
index 0000000..8ff53b2
--- /dev/null
+++ b/roads_management_insights/rmi_sample_queries/queries/data_scientist/ds3_feature_engineering.sql
@@ -0,0 +1,80 @@
+-- Copyright 2026 Google LLC
+--
+-- Licensed under the Apache License, Version 2.0 (the "License");
+-- you may not use this file except in compliance with the License.
+-- You may obtain a copy of the License at
+--
+-- https://www.apache.org/licenses/LICENSE-2.0
+--
+-- Unless required by applicable law or agreed to in writing, software
+-- distributed under the License is distributed on an "AS IS" BASIS,
+-- WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+-- See the License for the specific language governing permissions and
+-- limitations under the License.
+
+-- Data Scientist Query 3: Predictive Feature Engineering (Regularized Time-Series)
+-- Business Question: How can I prepare a high-quality, gap-aware feature set for training a predictive traffic model?
+-- Use Case: Demonstrates how to regularize a time-series using a timestamp grid. This ensures that window functions (LAG, AVG) accurately reflect chronological time even when records are missing due to quality filtering or detours.
+-- Product Stage: GA
+-- Estimated Bytes Processed: ~151 MB
+
+/*
+ HANDLING MISSING DATA (DETOURS/GAPS):
+ By joining the RMI data with a generated 'time_grid', we identify missing records.
+ Downstream models can then decide how to handle these nulls (e.g., interpolation,
+ imputation, or masking), preventing window functions from 'collapsing' time gaps.
+*/
+
+WITH quality_filtered_base AS (
+ SELECT
+ -- Truncating to hour to match the RMI collection interval
+ TIMESTAMP_TRUNC(h.record_time, HOUR) as record_hour,
+ h.duration_in_seconds,
+ ST_LENGTH(h.route_geometry) as actual_length,
+ CAST(JSON_VALUE(s.route_attributes, '$.route_length') AS FLOAT64) as intended_length
+ FROM `boston_oct_2025_sample_data.historical_travel_time` h
+ JOIN `boston_oct_2025_sample_data.routes_status` s USING(selected_route_id)
+ WHERE h.selected_route_id = 'route-4202493217'
+ AND h.record_time BETWEEN '2025-10-01' AND '2025-11-01'
+ -- Quality filter: Only process single, continuous paths
+ AND ST_GEOMETRYTYPE(h.route_geometry) = 'ST_LineString'
+),
+hourly_averages AS (
+ -- Aggregate to a single record per hour before regularizing
+ SELECT
+ record_hour,
+ AVG(duration_in_seconds) as avg_duration,
+ COUNT(*) as samples_in_hour
+ FROM quality_filtered_base
+ WHERE SAFE_DIVIDE(ABS(actual_length - intended_length), intended_length) < 0.05
+ GROUP BY 1
+),
+time_grid AS (
+ -- Generate a continuous hourly grid for the study period
+ SELECT hour
+ FROM UNNEST(GENERATE_TIMESTAMP_ARRAY('2025-10-01', '2025-10-31', INTERVAL 1 HOUR)) as hour
+),
+regularized_series AS (
+ SELECT
+ g.hour,
+ a.avg_duration as duration_in_seconds,
+ COALESCE(a.samples_in_hour, 0) as samples_in_hour,
+ IF(a.avg_duration IS NULL, TRUE, FALSE) as is_missing_data
+ FROM time_grid g
+ LEFT JOIN hourly_averages a ON g.hour = a.record_hour
+)
+SELECT
+ hour,
+ ROUND(duration_in_seconds, 2) as duration_in_seconds,
+ samples_in_hour,
+ is_missing_data,
+ -- Lagged features now accurately represent -1hr and -2hr regardless of data availability
+ ROUND(LAG(duration_in_seconds, 1) OVER (ORDER BY hour), 2) AS lag_1hr_duration,
+ ROUND(LAG(duration_in_seconds, 2) OVER (ORDER BY hour), 2) AS lag_2hr_duration,
+ -- Rolling average (3-hour window)
+ ROUND(AVG(duration_in_seconds) OVER (
+ ORDER BY hour
+ ROWS BETWEEN 2 PRECEDING AND CURRENT ROW
+ ), 2) AS rolling_avg_3hr
+FROM regularized_series
+ORDER BY hour DESC;
diff --git a/roads_management_insights/rmi_sample_queries/queries/data_scientist/ds4_route_integrity_audit.sql b/roads_management_insights/rmi_sample_queries/queries/data_scientist/ds4_route_integrity_audit.sql
new file mode 100644
index 0000000..8e90023
--- /dev/null
+++ b/roads_management_insights/rmi_sample_queries/queries/data_scientist/ds4_route_integrity_audit.sql
@@ -0,0 +1,91 @@
+-- Copyright 2026 Google LLC
+--
+-- Licensed under the Apache License, Version 2.0 (the "License");
+-- you may not use this file except in compliance with the License.
+-- You may obtain a copy of the License at
+--
+-- https://www.apache.org/licenses/LICENSE-2.0
+--
+-- Unless required by applicable law or agreed to in writing, software
+-- distributed under the License is distributed on an "AS IS" BASIS,
+-- WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+-- See the License for the specific language governing permissions and
+-- limitations under the License.
+
+-- Data Scientist Query 4: Route Integrity Audit (Time-Windowed)
+-- Business Question: When and for how long did specific routes experience extreme geometry deviations?
+-- Use Case: Identifies persistent "integrity incidents" rather than transient noise. By grouping consecutive failed records into windows, engineers can correlate failures with specific infrastructure changes, GPS outages, or registration updates.
+-- Product Stage: GA
+-- Estimated Bytes Processed: ~151 MB (Requires JOIN with routes_status)
+
+/*
+ DEFINITION: Route Integrity
+ In RMI, 'Route Integrity' measures the spatial consistency between a route's
+ registered definition and its actual data collection performance.
+
+ - The Baseline: 'intended_length' (meters) provided as a custom attribute during registration.
+ - The Signal: 'actual_length' (meters) calculated from the captured ST_LineString.
+ - High Integrity: A ratio near 1.0 (actual length matches intended definition).
+ - Low Integrity: Significant deviations (> 10%) indicate detours, missing road
+ segments, or incorrect metadata registration.
+*/
+
+/*
+ ANALYTICAL PATTERN: Islands and Gaps
+ This query groups consecutive records that exceed a 10% length deviation threshold
+ into discrete failure windows.
+*/
+
+WITH base_comparison AS (
+ SELECT
+ h.selected_route_id,
+ h.display_name,
+ h.record_time,
+ ST_LENGTH(h.route_geometry) AS actual_length,
+ CAST(JSON_VALUE(s.route_attributes, '$.route_length') AS FLOAT64) AS intended_length
+ FROM `boston_oct_2025_sample_data.historical_travel_time` h
+ JOIN `boston_oct_2025_sample_data.routes_status` s USING (selected_route_id)
+ WHERE s.status = 'STATUS_RUNNING'
+ AND h.record_time BETWEEN '2025-10-01' AND '2025-11-01'
+ -- Quality filter: Exclude non-continuous geometries
+ AND ST_GEOMETRYTYPE(h.route_geometry) = 'ST_LineString'
+),
+outlier_flagging AS (
+ SELECT
+ *,
+ -- Flag if deviation exceeds 10% (actual / intended)
+ IF(intended_length IS NOT NULL AND (SAFE_DIVIDE(actual_length, intended_length) > 1.1 OR SAFE_DIVIDE(actual_length, intended_length) < 0.9), 1, 0) as is_outlier
+ FROM base_comparison
+),
+streak_identification AS (
+ SELECT
+ *,
+ -- A new streak starts if this is an outlier and the previous record (by time) wasn't
+ IF(is_outlier = 1 AND LAG(is_outlier) OVER (PARTITION BY selected_route_id ORDER BY record_time) = 0, 1,
+ IF(is_outlier = 1 AND LAG(is_outlier) OVER (PARTITION BY selected_route_id ORDER BY record_time) IS NULL, 1, 0)) as is_streak_start
+ FROM outlier_flagging
+),
+streak_grouping AS (
+ SELECT
+ *,
+ -- Cumulative sum of starts creates a unique ID for each failure window
+ SUM(is_streak_start) OVER (PARTITION BY selected_route_id ORDER BY record_time) as streak_id
+ FROM streak_identification
+ WHERE is_outlier = 1
+)
+SELECT
+ selected_route_id,
+ display_name,
+ MIN(record_time) as failure_start,
+ MAX(record_time) as failure_end,
+ COUNT(*) as consecutive_records,
+ -- Average ratio across the window: Severity of the discrepancy
+ ROUND(AVG(SAFE_DIVIDE(actual_length, intended_length)), 2) as avg_deviation_ratio,
+ -- Identify if the deviation is an over-count (likely detour) or under-count (missing segments)
+ IF(AVG(actual_length) > AVG(intended_length), 'OVER_COUNT', 'UNDER_COUNT') as failure_type
+FROM streak_grouping
+GROUP BY selected_route_id, display_name, streak_id
+-- Focus on persistent issues
+HAVING consecutive_records >= 1
+ORDER BY failure_start DESC, avg_deviation_ratio DESC
+LIMIT 50;
diff --git a/roads_management_insights/rmi_sample_queries/queries/data_scientist/ds5_reliability_ranking.sql b/roads_management_insights/rmi_sample_queries/queries/data_scientist/ds5_reliability_ranking.sql
new file mode 100644
index 0000000..6646f13
--- /dev/null
+++ b/roads_management_insights/rmi_sample_queries/queries/data_scientist/ds5_reliability_ranking.sql
@@ -0,0 +1,89 @@
+-- Copyright 2026 Google LLC
+--
+-- Licensed under the Apache License, Version 2.0 (the "License");
+-- you may not use this file except in compliance with the License.
+-- You may obtain a copy of the License at
+--
+-- https://www.apache.org/licenses/LICENSE-2.0
+--
+-- Unless required by applicable law or agreed to in writing, software
+-- distributed under the License is distributed on an "AS IS" BASIS,
+-- WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+-- See the License for the specific language governing permissions and
+-- limitations under the License.
+
+-- Data Scientist Query 5: Persistent Unreliability Audit (Time-Windowed)
+-- Business Question: When and for how long did specific routes experience persistent travel time spikes?
+-- Use Case: Identifies chronic congestion incidents rather than transient variance. By grouping consecutive "slow" records into windows, operators can distinguish between random noise and actionable infrastructure failures or major events.
+-- Product Stage: GA
+-- Estimated Bytes Processed: ~151 MB (Requires JOIN with routes_status)
+
+/*
+ DEFINITION: Route Reliability vs. Route Integrity
+ - Route Integrity (DS4): Measures spatial consistency (Actual Geometry vs. Registered Definition).
+ - Route Reliability (DS5): Measures temporal performance (Actual Travel Time vs. Free-flow Baseline).
+
+ High Reliability means a route's travel time is stable and near its ideal baseline.
+ Low Reliability (this query) indicates persistent periods of 'excess delay'
+ where actual travel times significantly exceed free-flow estimates.
+*/
+
+/*
+ ANALYTICAL PATTERN: Reliability Gaps (Islands and Gaps)
+ 1. Calculate a historical baseline per route.
+ 2. Flag records where travel time exceeds a 'significant delay' threshold (e.g., 1.5x baseline).
+ 3. Group consecutive flags into discrete failure windows (streaks).
+*/
+
+WITH quality_filtered_history AS (
+ -- Standard quality filtering to ensure we analyze healthy geometries
+ SELECT
+ h.selected_route_id,
+ h.display_name,
+ h.record_time,
+ h.duration_in_seconds,
+ h.static_duration_in_seconds
+ FROM `boston_oct_2025_sample_data.historical_travel_time` h
+ JOIN `boston_oct_2025_sample_data.routes_status` s USING(selected_route_id)
+ WHERE h.record_time BETWEEN '2025-10-01' AND '2025-11-01'
+ AND ST_GEOMETRYTYPE(h.route_geometry) = 'ST_LineString'
+ AND SAFE_DIVIDE(ABS(ST_LENGTH(h.route_geometry) - CAST(JSON_VALUE(s.route_attributes, '$.route_length') AS FLOAT64)), CAST(JSON_VALUE(s.route_attributes, '$.route_length') AS FLOAT64)) < 0.05
+),
+incident_flagging AS (
+ SELECT
+ *,
+ -- Threshold: Travel time is more than 50% above the static (free-flow) baseline
+ IF(SAFE_DIVIDE(duration_in_seconds, static_duration_in_seconds) > 1.5, 1, 0) as is_incident
+ FROM quality_filtered_history
+),
+streak_identification AS (
+ SELECT
+ *,
+ -- Identify the start of a new consecutive incident window
+ IF(is_incident = 1 AND LAG(is_incident) OVER (PARTITION BY selected_route_id ORDER BY record_time) = 0, 1,
+ IF(is_incident = 1 AND LAG(is_incident) OVER (PARTITION BY selected_route_id ORDER BY record_time) IS NULL, 1, 0)) as is_streak_start
+ FROM incident_flagging
+),
+streak_grouping AS (
+ SELECT
+ *,
+ -- Cumulative sum of starts creates a unique ID for each incident window
+ SUM(is_streak_start) OVER (PARTITION BY selected_route_id ORDER BY record_time) as streak_id
+ FROM streak_identification
+ WHERE is_incident = 1
+)
+SELECT
+ selected_route_id,
+ display_name,
+ MIN(record_time) as incident_start,
+ MAX(record_time) as incident_end,
+ -- Total consecutive records in this incident window
+ COUNT(*) as consecutive_samples,
+ -- Average severity of the delay during this window
+ ROUND(AVG(SAFE_DIVIDE(duration_in_seconds, static_duration_in_seconds)), 2) as avg_delay_ratio
+FROM streak_grouping
+GROUP BY selected_route_id, display_name, streak_id
+-- Focus on persistent unreliability (lasting at least 3 samples)
+HAVING consecutive_samples >= 3
+ORDER BY avg_delay_ratio DESC, incident_start DESC
+LIMIT 50;
diff --git a/roads_management_insights/rmi_sample_queries/queries/data_scientist/ds6_travel_time_forecasting.sql b/roads_management_insights/rmi_sample_queries/queries/data_scientist/ds6_travel_time_forecasting.sql
new file mode 100644
index 0000000..0f655de
--- /dev/null
+++ b/roads_management_insights/rmi_sample_queries/queries/data_scientist/ds6_travel_time_forecasting.sql
@@ -0,0 +1,106 @@
+-- Copyright 2026 Google LLC
+--
+-- Licensed under the Apache License, Version 2.0 (the "License");
+-- you may not use this file except in compliance with the License.
+-- You may obtain a copy of the License at
+--
+-- https://www.apache.org/licenses/LICENSE-2.0
+--
+-- Unless required by applicable law or agreed to in writing, software
+-- distributed under the License is distributed on an "AS IS" BASIS,
+-- WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+-- See the License for the specific language governing permissions and
+-- limitations under the License.
+
+-- Data Scientist Query 6: Travel Time Forecasting (BigQuery ML ARIMA_PLUS)
+-- Business Question: Can we predict next week's peak travel times based on the last 21 days of history?
+-- Use Case: Demonstrates a complete predictive workflow: Training an ARIMA_PLUS model, evaluating its seasonal fit, and performing backtesting against actual results.
+-- Product Stage: GA (Uses BigQuery ML)
+-- Estimated Bytes Processed: ~150 MB
+
+/*
+ METHODOLOGY: TIME-SERIES BACKTESTING
+ To build trust in a traffic model, we use 'Backtesting'. We split our 31-day
+ sample dataset into two parts:
+ 1. Training Set (Weeks 1-3): The model 'learns' the route's diurnal and weekly rhythm.
+ 2. Validation Set (Week 4): We withhold this data from the model, then ask the
+ model to 'forecast' it. Comparing the forecast to reality gives us an
+ empirical accuracy score.
+*/
+
+/*
+ INTERPRETATION & VISUALIZATION GUIDE:
+
+ 1. REPORT INTERPRETATION:
+ - 'absolute_error': Smaller is better. Measures the magnitude of the prediction 'miss'.
+ - 'within_confidence_interval': This is your 'Anomaly Signal'.
+ - 'YES': Traffic is behaving normally/predictably.
+ - 'NO': A significant event occurred (accident, weather, gridlock) that
+ exceeded statistical expectations. This is the trigger for operational alerts.
+
+ 2. RECOMMENDED VISUALIZATIONS:
+ - Time-Series Line: Plot 'forecast_seconds' and 'actual_seconds' on the same Y-axis.
+ - Confidence Band: Plot 'lower_bound' and 'upper_bound' as a shaded area. Dots
+ (actuals) falling outside this band are your truly actionable traffic incidents.
+*/
+
+-- STEP 1: Train the ARIMA_PLUS model using a 3-week window.
+-- We use hourly aggregation (AVG) to regularize the input for the ARIMA algorithm.
+CREATE OR REPLACE MODEL `your-project.your-dataset.travel_time_forecast_model`
+OPTIONS(
+ model_type='ARIMA_PLUS',
+ time_series_timestamp_col='record_hour',
+ time_series_data_col='duration_in_seconds',
+ auto_arima=TRUE, -- Automatically finds the best P, D, Q parameters.
+ data_frequency='HOURLY',
+ clean_spikes_and_dips=TRUE -- Prevents one-off accidents from skewing the long-term trend.
+) AS
+SELECT
+ TIMESTAMP_TRUNC(record_time, HOUR) as record_hour,
+ AVG(duration_in_seconds) as duration_in_seconds
+FROM `boston_oct_2025_sample_data.historical_travel_time`
+WHERE selected_route_id = 'route-4202493217'
+ AND record_time BETWEEN '2025-10-01' AND '2025-10-21'
+GROUP BY 1;
+
+-- STEP 2: Evaluate the model's training metrics.
+-- This returns AIC, Log Likelihood, and identified seasonal periods (e.g., DAILY).
+-- A low AIC relative to other models indicates a better fit.
+SELECT * FROM ML.EVALUATE(MODEL `your-project.your-dataset.travel_time_forecast_model`);
+
+-- STEP 3: Compare Forecast vs. Actual for the 4th week (Backtesting).
+-- We forecast a 168-hour 'horizon' (7 full days) to match the final week of October.
+WITH forecast_data AS (
+ SELECT
+ forecast_timestamp,
+ forecast_value as predicted_duration,
+ prediction_interval_lower_bound as lower_bound,
+ prediction_interval_upper_bound as upper_bound
+ FROM ML.FORECAST(MODEL `your-project.your-dataset.travel_time_forecast_model`,
+ STRUCT(168 AS horizon, 0.9 AS confidence_level))
+),
+actual_data AS (
+ -- Aggregate actual withheld data to the same hourly grid for comparison.
+ SELECT
+ TIMESTAMP_TRUNC(record_time, HOUR) as record_hour,
+ AVG(duration_in_seconds) as actual_duration
+ FROM `boston_oct_2025_sample_data.historical_travel_time`
+ WHERE selected_route_id = 'route-4202493217'
+ AND record_time BETWEEN '2025-10-22' AND '2025-10-29'
+ GROUP BY 1
+)
+SELECT
+ f.forecast_timestamp,
+ ROUND(f.predicted_duration, 1) as forecast_seconds,
+ ROUND(a.actual_duration, 1) as actual_seconds,
+ -- absolute_error: How many seconds off was the prediction?
+ ROUND(ABS(f.predicted_duration - a.actual_duration), 1) as absolute_error,
+ -- within_confidence_interval: Was reality within the 90% expected range?
+ IF(a.actual_duration BETWEEN f.lower_bound AND f.upper_bound, 'YES', 'NO') as within_confidence_interval,
+ -- Include bounds for visualization in tools like Looker Studio or Colab.
+ ROUND(f.lower_bound, 1) as lower_bound,
+ ROUND(f.upper_bound, 1) as upper_bound
+FROM forecast_data f
+LEFT JOIN actual_data a ON f.forecast_timestamp = a.record_hour
+WHERE a.actual_duration IS NOT NULL
+ORDER BY f.forecast_timestamp;
diff --git a/roads_management_insights/rmi_sample_queries/queries/data_scientist/ds7_zero_shot_forecasting.sql b/roads_management_insights/rmi_sample_queries/queries/data_scientist/ds7_zero_shot_forecasting.sql
new file mode 100644
index 0000000..611e98b
--- /dev/null
+++ b/roads_management_insights/rmi_sample_queries/queries/data_scientist/ds7_zero_shot_forecasting.sql
@@ -0,0 +1,55 @@
+-- Copyright 2026 Google LLC
+--
+-- Licensed under the Apache License, Version 2.0 (the "License");
+-- you may not use this file except in compliance with the License.
+-- You may obtain a copy of the License at
+--
+-- https://www.apache.org/licenses/LICENSE-2.0
+--
+-- Unless required by applicable law or agreed to in writing, software
+-- distributed under the License is distributed on an "AS IS" BASIS,
+-- WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+-- See the License for the specific language governing permissions and
+-- limitations under the License.
+
+-- Data Scientist Query 7: Zero-Shot Multi-Route Forecasting (TimesFM)
+-- Business Question: Can we immediately forecast next-day traffic for multiple routes without waiting to train individual models?
+-- Use Case: Demonstrates 'Zero-Shot' forecasting using Google's Time Series Foundation Model (TimesFM). Unlike ARIMA, this model uses pre-trained patterns to predict future travel times for an entire cluster of routes simultaneously, even with limited local history.
+-- Product Stage: GA (Uses AI.FORECAST with TimesFM)
+-- Estimated Bytes Processed: ~150 MB
+
+/*
+ ANALYTICAL ADVANTAGE: Foundation Models vs. Traditional Models
+ - ARIMA_PLUS (DS6): Requires 'Training' (Learning) on specific route history first.
+ - TimesFM (DS7): Uses 'Zero-Shot' inference via AI.FORECAST. It applies global
+ patterns to your data immediately. Ideal for 'Cold Start' (new routes) or
+ scaling to thousands of routes without per-route training overhead.
+*/
+
+-- STEP 1: Prepare a 'Context' window of history for multiple routes.
+-- Foundation models like TimesFM perform best with 3-7 days of chronological context.
+WITH route_context AS (
+ SELECT
+ selected_route_id,
+ TIMESTAMP_TRUNC(record_time, HOUR) as record_hour,
+ AVG(duration_in_seconds) as duration_in_seconds
+ FROM `boston_oct_2025_sample_data.historical_travel_time`
+ -- We pick a 7-day context window for 3 specific routes
+ WHERE selected_route_id IN ('route-4202493217', 'route-3850158153', 'route-381361371')
+ AND record_time BETWEEN '2025-10-14' AND '2025-10-21'
+ GROUP BY 1, 2
+)
+-- STEP 2: Use AI.FORECAST to generate predictions.
+-- Note: TimesFM is a managed foundation model; no CREATE MODEL is required.
+SELECT
+ *
+FROM AI.FORECAST(
+ TABLE route_context,
+ data_col => 'duration_in_seconds',
+ timestamp_col => 'record_hour',
+ model => 'TimesFM 2.0', -- Specify the foundation model version
+ id_cols => ['selected_route_id'], -- Forecast each route independently
+ horizon => 24, -- Forecast 24 hours ahead
+ confidence_level => 0.9 -- Generate 90% confidence intervals
+)
+ORDER BY selected_route_id, forecast_timestamp;
diff --git a/roads_management_insights/rmi_sample_queries/queries/rmi_planner/rmip1_usage_projection.sql b/roads_management_insights/rmi_sample_queries/queries/rmi_planner/rmip1_usage_projection.sql
new file mode 100644
index 0000000..25e7b97
--- /dev/null
+++ b/roads_management_insights/rmi_sample_queries/queries/rmi_planner/rmip1_usage_projection.sql
@@ -0,0 +1,42 @@
+-- Copyright 2026 Google LLC
+--
+-- Licensed under the Apache License, Version 2.0 (the "License");
+-- you may not use this file except in compliance with the License.
+-- You may obtain a copy of the License at
+--
+-- https://www.apache.org/licenses/LICENSE-2.0
+--
+-- Unless required by applicable law or agreed to in writing, software
+-- distributed under the License is distributed on an "AS IS" BASIS,
+-- WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+-- See the License for the specific language governing permissions and
+-- limitations under the License.
+
+-- RMI Seller Query 1: Usage Growth Projection
+-- Business Question: Based on current data, what is the rate of record creation, and how will it scale?
+-- Use Case: Helps sales teams estimate BigQuery storage and compute growth as a customer increases their monitoring footprint from a small pilot to an enterprise-wide fleet.
+-- Product Stage: GA
+-- Estimated Bytes Processed: < 1 MB (Standard SQL on RMI Tables)
+
+WITH daily_stats AS (
+ SELECT
+ DATE(record_time) as log_date,
+ selected_route_id,
+ COUNT(*) as records_per_day
+ FROM `boston_oct_2025_sample_data.historical_travel_time`
+ WHERE record_time BETWEEN '2025-10-01' AND '2025-11-01'
+ GROUP BY 1, 2
+),
+avg_usage AS (
+ SELECT
+ AVG(records_per_day) as avg_daily_records_per_route
+ FROM daily_stats
+)
+SELECT
+ ROUND(avg_daily_records_per_route, 2) as avg_records_per_route_per_day,
+ -- Parameter: Target fleet size (e.g., 5,000 routes)
+ 5000 as target_route_count,
+ ROUND(avg_daily_records_per_route * 5000, 0) as estimated_total_daily_records,
+ -- Extrapolate to monthly volume in millions of records
+ ROUND(avg_daily_records_per_route * 5000 * 30 / 1000000, 2) as estimated_monthly_millions
+FROM avg_usage;
diff --git a/roads_management_insights/rmi_sample_queries/queries/rmi_planner/rmip2_customer_roi.sql b/roads_management_insights/rmi_sample_queries/queries/rmi_planner/rmip2_customer_roi.sql
new file mode 100644
index 0000000..344d43d
--- /dev/null
+++ b/roads_management_insights/rmi_sample_queries/queries/rmi_planner/rmip2_customer_roi.sql
@@ -0,0 +1,35 @@
+-- Copyright 2026 Google LLC
+--
+-- Licensed under the Apache License, Version 2.0 (the "License");
+-- you may not use this file except in compliance with the License.
+-- You may obtain a copy of the License at
+--
+-- https://www.apache.org/licenses/LICENSE-2.0
+--
+-- Unless required by applicable law or agreed to in writing, software
+-- distributed under the License is distributed on an "AS IS" BASIS,
+-- WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+-- See the License for the specific language governing permissions and
+-- limitations under the License.
+
+-- RMI Seller Query 2: Customer ROI (Value at Risk)
+-- Business Question: How much total time is lost to congestion across different customer service tiers?
+-- Use Case: Translates raw traffic data into "Business Value" by quantifying the potential time savings for priority routes, justifying the monitoring cost and providing a clear ROI for the customer.
+-- Product Stage: GA
+-- Estimated Bytes Processed: < 1 MB (Standard SQL on RMI Tables)
+
+SELECT
+ JSON_EXTRACT_SCALAR(route_attributes, '$.tier') as service_tier,
+ -- Aggregate total lost time (Actual - Free-flow) converted to hours
+ ROUND(SUM(duration_in_seconds - static_duration_in_seconds) / 3600, 1) as total_delay_hours,
+ COUNT(DISTINCT h.selected_route_id) as monitored_routes,
+ -- Average performance multiplier
+ ROUND(AVG(SAFE_DIVIDE(duration_in_seconds, static_duration_in_seconds)), 2) as avg_delay_index
+FROM `boston_oct_2025_sample_data.historical_travel_time` h
+JOIN `boston_oct_2025_sample_data.routes_status` s ON h.selected_route_id = s.selected_route_id
+WHERE h.record_time BETWEEN '2025-10-01' AND '2025-11-01'
+ -- Filter for records where actual was slower than free-flow
+ AND (duration_in_seconds - static_duration_in_seconds) > 0
+GROUP BY 1
+HAVING service_tier IS NOT NULL
+ORDER BY total_delay_hours DESC;
diff --git a/roads_management_insights/rmi_sample_queries/queries/rmi_planner/rmip3_segment_estimation.sql b/roads_management_insights/rmi_sample_queries/queries/rmi_planner/rmip3_segment_estimation.sql
new file mode 100644
index 0000000..be60f31
--- /dev/null
+++ b/roads_management_insights/rmi_sample_queries/queries/rmi_planner/rmip3_segment_estimation.sql
@@ -0,0 +1,46 @@
+-- Copyright 2026 Google LLC
+--
+-- Licensed under the Apache License, Version 2.0 (the "License");
+-- you may not use this file except in compliance with the License.
+-- You may obtain a copy of the License at
+--
+-- https://www.apache.org/licenses/LICENSE-2.0
+--
+-- Unless required by applicable law or agreed to in writing, software
+-- distributed under the License is distributed on an "AS IS" BASIS,
+-- WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+-- See the License for the specific language governing permissions and
+-- limitations under the License.
+
+-- RMI Seller Query 3: Road Segment Estimation
+-- Business Question: How many physical road segments exist in our target study area, categorized by class?
+-- Use Case: Helps sales and solution architects estimate the "Total Addressable Monitoring" footprint for a city, aiding in pricing and coverage strategy.
+-- Product Stage: GA
+-- Estimated Bytes Processed: ~1 MB (Uses BigQuery Public Dataset: Overture Maps)
+
+/*
+ EXTERNAL DEPENDENCY:
+ RMI monitoring is based on user-defined routes. To understand the underlying
+ physical scale of an area, this query joins with the Overture Maps public
+ dataset to provide a baseline count of all physical road segments.
+*/
+
+WITH target_boundary AS (
+ SELECT geometry
+ FROM `bigquery-public-data.overture_maps.division_area`
+ WHERE names.primary = 'Boston'
+ AND country = 'US'
+ AND region = 'US-MA'
+ AND class = 'land'
+)
+SELECT
+ -- Group by physical road classification (e.g., motorway, primary, local)
+ class as road_class,
+ subtype,
+ COUNT(*) as segment_count,
+ ROUND(SUM(ST_LENGTH(s.geometry)) / 1000, 2) as total_length_km
+FROM `bigquery-public-data.overture_maps.segment` s
+JOIN target_boundary b ON ST_INTERSECTS(s.geometry, b.geometry)
+WHERE s.subtype = 'road'
+GROUP BY 1, 2
+ORDER BY segment_count DESC;
diff --git a/roads_management_insights/rmi_sample_queries/queries/rmi_planner/rmip4_area_boundary.sql b/roads_management_insights/rmi_sample_queries/queries/rmi_planner/rmip4_area_boundary.sql
new file mode 100644
index 0000000..659a210
--- /dev/null
+++ b/roads_management_insights/rmi_sample_queries/queries/rmi_planner/rmip4_area_boundary.sql
@@ -0,0 +1,50 @@
+-- Copyright 2026 Google LLC
+--
+-- Licensed under the Apache License, Version 2.0 (the "License");
+-- you may not use this file except in compliance with the License.
+-- You may obtain a copy of the License at
+--
+-- https://www.apache.org/licenses/LICENSE-2.0
+--
+-- Unless required by applicable law or agreed to in writing, software
+-- distributed under the License is distributed on an "AS IS" BASIS,
+-- WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+-- See the License for the specific language governing permissions and
+-- limitations under the License.
+
+-- RMI Seller Query 4: Create Reusable Area Boundary
+-- Business Question: How can I create a reusable, open-source administrative boundary for my target study area?
+-- Use Case: Establishes a "Master Boundary" for a city or region using public data. This view can then be joined with RMI tables to automate geofencing and localized reporting.
+-- Product Stage: GA
+-- Estimated Bytes Processed: ~1 MB (Uses BigQuery Public Dataset: Overture Maps)
+
+/*
+ NOTE: This query creates a persistent view of a target city's official boundary.
+ The source dataset (e.g., `boston_oct_2025_sample_data`) is read-only.
+ This view MUST be created in a separate, writable dataset within your project.
+ Replace `your-project.your-dataset` with your target location.
+*/
+
+CREATE OR REPLACE VIEW `your-project.your-dataset.target_area_boundary`
+(
+ division_id OPTIONS(description="Stable identifier for the administrative division."),
+ area_name OPTIONS(description="The primary display name (e.g. Boston)."),
+ region OPTIONS(description="The ISO state/province code (e.g. US-MA)."),
+ country OPTIONS(description="The ISO country code."),
+ geometry OPTIONS(description="The physical land boundary of the division as a GEOGRAPHY polygon.")
+)
+OPTIONS(
+ description="A reusable administrative boundary for geofencing RMI analytical assets."
+)
+AS
+SELECT
+ id AS division_id,
+ names.primary AS area_name,
+ region,
+ country,
+ geometry
+FROM `bigquery-public-data.overture_maps.division_area`
+WHERE names.primary = 'Boston'
+ AND country = 'US'
+ AND region = 'US-MA'
+ AND class = 'land';
diff --git a/roads_management_insights/rmi_sample_queries/queries/traffic_operations_manager/tom1_peak_hour_delay.sql b/roads_management_insights/rmi_sample_queries/queries/traffic_operations_manager/tom1_peak_hour_delay.sql
new file mode 100644
index 0000000..acfe2ce
--- /dev/null
+++ b/roads_management_insights/rmi_sample_queries/queries/traffic_operations_manager/tom1_peak_hour_delay.sql
@@ -0,0 +1,53 @@
+-- Copyright 2026 Google LLC
+--
+-- Licensed under the Apache License, Version 2.0 (the "License");
+-- you may not use this file except in compliance with the License.
+-- You may obtain a copy of the License at
+--
+-- https://www.apache.org/licenses/LICENSE-2.0
+--
+-- Unless required by applicable law or agreed to in writing, software
+-- distributed under the License is distributed on an "AS IS" BASIS,
+-- WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+-- See the License for the specific language governing permissions and
+-- limitations under the License.
+
+-- Traffic Operations Manager Query 1: Peak Hour Delay Analysis
+-- Business Question: What is the average travel time delay during the morning peak (7-9 AM) for the top 10 most congested routes?
+-- Use Case: Identifies critical morning commute bottlenecks to inform operational decisions or public messaging.
+-- Product Stage: GA
+-- Estimated Bytes Processed: ~151 MB (Requires JOIN with routes_status)
+
+/*
+ ANALYTICAL PATTERN: Temporal Filtering
+ This query uses EXTRACT(HOUR...) on a converted DATETIME to focus on local
+ Boston peak windows. It filters for active routes and applies a quality
+ check to ensure the geometry is a single ST_LineString.
+*/
+
+WITH peak_hour_data AS (
+ SELECT
+ h.selected_route_id,
+ h.display_name,
+ -- delay_ratio > 1.0 indicates travel time is slower than free-flow (static)
+ SAFE_DIVIDE(h.duration_in_seconds, h.static_duration_in_seconds) AS delay_ratio
+ FROM `boston_oct_2025_sample_data.historical_travel_time` AS h
+ JOIN `boston_oct_2025_sample_data.routes_status` AS s USING (selected_route_id)
+ WHERE h.record_time BETWEEN '2025-10-01' AND '2025-11-01'
+ -- STATUS_RUNNING ensures we only analyze routes that are currently being monitored
+ AND s.status = 'STATUS_RUNNING'
+ -- AM Peak Window: 7:00 AM to 8:59 AM Local Time
+ AND EXTRACT(HOUR FROM DATETIME(h.record_time, 'America/New_York')) BETWEEN 7 AND 8
+ -- Geometry Integrity: Only process continuous, healthy paths
+ AND ST_GEOMETRYTYPE(h.route_geometry) = 'ST_LineString'
+)
+SELECT
+ display_name,
+ ROUND(AVG(delay_ratio), 3) AS avg_delay_ratio,
+ COUNT(*) AS sample_count
+FROM peak_hour_data
+GROUP BY 1
+-- Threshold: Filter for routes that are at least marginally slower than free-flow
+HAVING avg_delay_ratio > 1.0
+ORDER BY avg_delay_ratio DESC
+LIMIT 10;
diff --git a/roads_management_insights/rmi_sample_queries/queries/traffic_operations_manager/tom2_persistent_bottlenecks.sql b/roads_management_insights/rmi_sample_queries/queries/traffic_operations_manager/tom2_persistent_bottlenecks.sql
new file mode 100644
index 0000000..ddd90f2
--- /dev/null
+++ b/roads_management_insights/rmi_sample_queries/queries/traffic_operations_manager/tom2_persistent_bottlenecks.sql
@@ -0,0 +1,48 @@
+-- Copyright 2026 Google LLC
+--
+-- Licensed under the Apache License, Version 2.0 (the "License");
+-- you may not use this file except in compliance with the License.
+-- You may obtain a copy of the License at
+--
+-- https://www.apache.org/licenses/LICENSE-2.0
+--
+-- Unless required by applicable law or agreed to in writing, software
+-- distributed under the License is distributed on an "AS IS" BASIS,
+-- WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+-- See the License for the specific language governing permissions and
+-- limitations under the License.
+
+-- Traffic Operations Manager Query 2: Persistent Bottlenecks (Segment-Level)
+-- Business Question: Which road segments (SRIs) have been in a 'TRAFFIC_JAM' state most frequently?
+-- Use Case: Locates recurring local bottlenecks within routes, enabling targeted infrastructure investigation or signal timing adjustments.
+-- Product Stage: GA
+-- Estimated Bytes Processed: ~250 MB (Requires UNNEST of speed_reading_intervals)
+
+/*
+ ANALYTICAL PATTERN: SRI Unnesting
+ RMI routes store segment-level traffic states (SRI) in a nested array.
+ This query 'explodes' that array using UNNEST to audit the frequency of
+ severe congestion across the entire network.
+*/
+
+WITH exploded_sris AS (
+ SELECT
+ selected_route_id,
+ display_name,
+ -- 'speed' represents the RMI traffic state for that specific interval
+ sri.speed
+ FROM `boston_oct_2025_sample_data.recent_roads_data`,
+ UNNEST(speed_reading_intervals) AS sri
+ WHERE record_time BETWEEN '2025-10-01' AND '2025-11-01'
+)
+SELECT
+ selected_route_id,
+ display_name,
+ -- We count every occurrence of an interval being in a 'TRAFFIC_JAM'
+ COUNT(*) AS traffic_jam_count
+FROM exploded_sris
+-- Filter exclusively for the most severe RMI congestion state
+WHERE speed = 'TRAFFIC_JAM'
+GROUP BY 1, 2
+ORDER BY traffic_jam_count DESC
+LIMIT 10;
diff --git a/roads_management_insights/rmi_sample_queries/queries/traffic_operations_manager/tom3_operational_health.sql b/roads_management_insights/rmi_sample_queries/queries/traffic_operations_manager/tom3_operational_health.sql
new file mode 100644
index 0000000..2467b61
--- /dev/null
+++ b/roads_management_insights/rmi_sample_queries/queries/traffic_operations_manager/tom3_operational_health.sql
@@ -0,0 +1,42 @@
+-- Copyright 2026 Google LLC
+--
+-- Licensed under the Apache License, Version 2.0 (the "License");
+-- you may not use this file except in compliance with the License.
+-- You may obtain a copy of the License at
+--
+-- https://www.apache.org/licenses/LICENSE-2.0
+--
+-- Unless required by applicable law or agreed to in writing, software
+-- distributed under the License is distributed on an "AS IS" BASIS,
+-- WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+-- See the License for the specific language governing permissions and
+-- limitations under the License.
+
+-- Traffic Operations Manager Query 3: Operational Health Check
+-- Business Question: Which active routes are currently flagged with a 'LOW_ROAD_USAGE' validation error?
+-- Use Case: Monitors the reliability of data collection. Low usage flags indicate that insights for these routes may be based on fewer probes, requiring a review of route priority or placement.
+-- Product Stage: GA
+-- Estimated Bytes Processed: < 1 MB
+
+/*
+ ANALYTICAL PATTERN: Status Auditing
+ This query inspects the management plane table (routes_status) to identify
+ active routes that have quality warnings. This is critical for maintaining
+ trust in downstream traffic analytics.
+*/
+
+SELECT
+ display_name,
+ selected_route_id,
+ status,
+ validation_error,
+ -- 'low_road_usage_start_time' is specifically populated when probe density drops below threshold
+ low_road_usage_start_time,
+ -- Time elapsed since the error was first detected (relative to sample end date)
+ DATETIME_DIFF(DATETIME('2025-11-01'), DATETIME(low_road_usage_start_time, 'UTC'), DAY) AS days_in_error_state
+FROM `boston_oct_2025_sample_data.routes_status`
+-- We only care about errors on routes that are supposed to be active (STATUS_RUNNING)
+WHERE status = 'STATUS_RUNNING'
+ -- Filter specifically for the Low Road Usage warning
+ AND validation_error = 'VALIDATION_ERROR_LOW_ROAD_USAGE'
+ORDER BY days_in_error_state DESC;
diff --git a/roads_management_insights/rmi_sample_queries/queries/traffic_operations_manager/tom4_data_latency.sql b/roads_management_insights/rmi_sample_queries/queries/traffic_operations_manager/tom4_data_latency.sql
new file mode 100644
index 0000000..95fc063
--- /dev/null
+++ b/roads_management_insights/rmi_sample_queries/queries/traffic_operations_manager/tom4_data_latency.sql
@@ -0,0 +1,49 @@
+-- Copyright 2026 Google LLC
+--
+-- Licensed under the Apache License, Version 2.0 (the "License");
+-- you may not use this file except in compliance with the License.
+-- You may obtain a copy of the License at
+--
+-- https://www.apache.org/licenses/LICENSE-2.0
+--
+-- Unless required by applicable law or agreed to in writing, software
+-- distributed under the License is distributed on an "AS IS" BASIS,
+-- WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+-- See the License for the specific language governing permissions and
+-- limitations under the License.
+
+-- Traffic Operations Manager Query 4: Data Collection Latency
+-- Business Question: Are there any active routes that have stopped sending data near the end of the snapshot period?
+-- Use Case: Detects localized data gaps or "silent" routes in real-time, enabling operators to investigate issues before they impact reporting.
+-- Product Stage: GA
+-- Estimated Bytes Processed: ~151 MB (Requires JOIN with routes_status)
+
+/*
+ ANALYTICAL PATTERN: Freshness Monitoring
+ By comparing the max record_time per route against the overall dataset
+ end-time, we can identify routes that have 'gone silent'.
+*/
+
+WITH last_data_arrival AS (
+ SELECT
+ selected_route_id,
+ -- Get the latest record timestamp for every route in the dataset
+ MAX(record_time) AS last_arrival
+ FROM `boston_oct_2025_sample_data.historical_travel_time`
+ -- Focused partition scan for the full sample month
+ WHERE record_time BETWEEN '2025-10-01' AND '2025-11-01'
+ GROUP BY 1
+)
+SELECT
+ s.selected_route_id,
+ s.display_name,
+ l.last_arrival,
+ -- Measured relative to the very end of the sample dataset ('2025-11-01')
+ TIMESTAMP_DIFF(TIMESTAMP('2025-11-01 00:00:00'), l.last_arrival, MINUTE) as minutes_of_silence
+FROM `boston_oct_2025_sample_data.routes_status` s
+LEFT JOIN last_data_arrival l USING (selected_route_id)
+-- Focus on routes that are supposed to be producing data
+WHERE s.status = 'STATUS_RUNNING'
+ -- Threshold: Highlight routes that haven't sent a record in the last 2 minutes of the dataset
+ AND l.last_arrival < TIMESTAMP('2025-10-31 23:58:00')
+ORDER BY minutes_of_silence DESC;
diff --git a/roads_management_insights/rmi_sample_queries/queries/traffic_operations_manager/tom5_significant_event_detection.sql b/roads_management_insights/rmi_sample_queries/queries/traffic_operations_manager/tom5_significant_event_detection.sql
new file mode 100644
index 0000000..a90db73
--- /dev/null
+++ b/roads_management_insights/rmi_sample_queries/queries/traffic_operations_manager/tom5_significant_event_detection.sql
@@ -0,0 +1,47 @@
+-- Copyright 2026 Google LLC
+--
+-- Licensed under the Apache License, Version 2.0 (the "License");
+-- you may not use this file except in compliance with the License.
+-- You may obtain a copy of the License at
+--
+-- https://www.apache.org/licenses/LICENSE-2.0
+--
+-- Unless required by applicable law or agreed to in writing, software
+-- distributed under the License is distributed on an "AS IS" BASIS,
+-- WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+-- See the License for the specific language governing permissions and
+-- limitations under the License.
+
+-- Traffic Operations Manager Query 5: Significant Event Detection
+-- Business Question: Which routes experienced a travel time more than double their static baseline in the last 24 hours?
+-- Use Case: Automates the detection of extreme traffic events (accidents, severe weather, gridlock) that require immediate operational intervention.
+-- Product Stage: GA
+-- Estimated Bytes Processed: ~151 MB (Requires JOIN with routes_status)
+
+/*
+ ANALYTICAL PATTERN: Threshold-Based Alerting
+ This query identifies major traffic incidents by flagging records where the
+ actual travel time is at least 2x the free-flow baseline (static_duration).
+ It applies quality filters to ensure alerts are only triggered for single,
+ continuous paths.
+*/
+
+SELECT
+ h.display_name,
+ h.selected_route_id,
+ h.record_time,
+ h.duration_in_seconds,
+ h.static_duration_in_seconds,
+ -- Delay ratio > 2.0 means travel time is 2x slower than ideal
+ ROUND(SAFE_DIVIDE(h.duration_in_seconds, h.static_duration_in_seconds), 2) AS delay_ratio
+FROM `boston_oct_2025_sample_data.historical_travel_time` AS h
+JOIN `boston_oct_2025_sample_data.routes_status` AS s USING(selected_route_id)
+-- Filter for the final day of the sample dataset
+WHERE h.record_time BETWEEN '2025-10-30' AND '2025-11-01'
+ -- Focus on active monitoring fleet
+ AND s.status = 'STATUS_RUNNING'
+ -- Filter for "Significant" events
+ AND SAFE_DIVIDE(h.duration_in_seconds, h.static_duration_in_seconds) > 2.0
+ -- Quality filter: Exclude non-continuous geometries
+ AND ST_GEOMETRYTYPE(h.route_geometry) = 'ST_LineString'
+ORDER BY delay_ratio DESC;
diff --git a/roads_management_insights/rmi_sample_queries/queries/urban_planner/up1_corridor_trend.sql b/roads_management_insights/rmi_sample_queries/queries/urban_planner/up1_corridor_trend.sql
new file mode 100644
index 0000000..7123751
--- /dev/null
+++ b/roads_management_insights/rmi_sample_queries/queries/urban_planner/up1_corridor_trend.sql
@@ -0,0 +1,46 @@
+-- Copyright 2026 Google LLC
+--
+-- Licensed under the Apache License, Version 2.0 (the "License");
+-- you may not use this file except in compliance with the License.
+-- You may obtain a copy of the License at
+--
+-- https://www.apache.org/licenses/LICENSE-2.0
+--
+-- Unless required by applicable law or agreed to in writing, software
+-- distributed under the License is distributed on an "AS IS" BASIS,
+-- WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+-- See the License for the specific language governing permissions and
+-- limitations under the License.
+
+-- Urban Planner Query 1: Corridor Performance Trend
+-- Business Question: What has been the week-over-week trend in the average delay ratio for a specific corridor?
+-- Use Case: Enables long-term performance monitoring of critical transportation infrastructure, helping planners identify if congestion is worsening or improving over time.
+-- Product Stage: GA
+-- Estimated Bytes Processed: ~150 MB
+
+/*
+ ANALYTICAL PATTERN: Weekly Trend Aggregation
+ This query truncates timestamps to the week level to smooth out day-to-day
+ fluctuations, focusing on the macro traffic behavior of a critical route.
+*/
+
+WITH weekly_trends AS (
+ SELECT
+ selected_route_id,
+ -- Truncate to the start of the week for consistent aggregation
+ TIMESTAMP_TRUNC(record_time, WEEK) AS week,
+ -- Calculate average delay (Actual / Free-flow baseline)
+ AVG(SAFE_DIVIDE(duration_in_seconds, static_duration_in_seconds)) AS avg_delay_ratio
+ FROM `boston_oct_2025_sample_data.historical_travel_time`
+ -- Filter for a specific corridor of interest (e.g., Storrow Drive)
+ WHERE selected_route_id = 'route-4202493217'
+ AND record_time BETWEEN '2025-10-01' AND '2025-11-01'
+ GROUP BY 1, 2
+)
+SELECT
+ selected_route_id,
+ -- Format for readable year-week reporting
+ FORMAT_TIMESTAMP("%Y-%W", week) AS year_week,
+ ROUND(avg_delay_ratio, 3) AS avg_delay_ratio
+FROM weekly_trends
+ORDER BY week;
diff --git a/roads_management_insights/rmi_sample_queries/queries/urban_planner/up2_impact_analysis.sql b/roads_management_insights/rmi_sample_queries/queries/urban_planner/up2_impact_analysis.sql
new file mode 100644
index 0000000..36cd845
--- /dev/null
+++ b/roads_management_insights/rmi_sample_queries/queries/urban_planner/up2_impact_analysis.sql
@@ -0,0 +1,50 @@
+-- Copyright 2026 Google LLC
+--
+-- Licensed under the Apache License, Version 2.0 (the "License");
+-- you may not use this file except in compliance with the License.
+-- You may obtain a copy of the License at
+--
+-- https://www.apache.org/licenses/LICENSE-2.0
+--
+-- Unless required by applicable law or agreed to in writing, software
+-- distributed under the License is distributed on an "AS IS" BASIS,
+-- WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+-- See the License for the specific language governing permissions and
+-- limitations under the License.
+
+-- Urban Planner Query 2: Before-and-After Impact Analysis
+-- Business Question: Has the average travel time on routes passing through a recent construction zone improved since the project's completion date?
+-- Use Case: Provides empirical evidence of infrastructure project success, validating whether road improvements (like new lanes or signals) actually reduced congestion.
+-- Product Stage: GA
+-- Estimated Bytes Processed: ~150 MB
+
+/*
+ ANALYTICAL PATTERN: Spatial & Milestone Join
+ This query uses a DECLARE statement for the study area geometry to ensure
+ BigQuery treats the polygon as a constant, enabling efficient spatial
+ indexing during the ST_INTERSECTS join. It then segments the data based
+ on a chronological project milestone.
+*/
+
+-- Study Area: Downtown Boston Intersection
+DECLARE study_area GEOGRAPHY DEFAULT ST_GEOGFROMTEXT('POLYGON((-71.06 42.35, -71.05 42.35, -71.05 42.34, -71.06 42.34, -71.06 42.35))');
+-- Project Milestone: Date when construction was completed
+DECLARE completion_date DATE DEFAULT '2025-10-15';
+
+WITH impact_data AS (
+ SELECT
+ -- Split records into 'Before' and 'After' buckets
+ record_time >= completion_date AS is_after_completion,
+ SAFE_DIVIDE(duration_in_seconds, static_duration_in_seconds) AS delay_ratio
+ FROM `boston_oct_2025_sample_data.historical_travel_time`
+ -- Filter for routes that physically pass through the study zone
+ WHERE ST_INTERSECTS(route_geometry, study_area)
+ AND record_time BETWEEN '2025-10-01' AND '2025-11-01'
+)
+SELECT
+ is_after_completion,
+ ROUND(AVG(delay_ratio), 3) AS avg_delay_ratio,
+ COUNT(*) as sample_count
+FROM impact_data
+GROUP BY 1
+ORDER BY 1;
diff --git a/roads_management_insights/rmi_sample_queries/queries/urban_planner/up3_monitoring_density.sql b/roads_management_insights/rmi_sample_queries/queries/urban_planner/up3_monitoring_density.sql
new file mode 100644
index 0000000..d6db411
--- /dev/null
+++ b/roads_management_insights/rmi_sample_queries/queries/urban_planner/up3_monitoring_density.sql
@@ -0,0 +1,47 @@
+-- Copyright 2026 Google LLC
+--
+-- Licensed under the Apache License, Version 2.0 (the "License");
+-- you may not use this file except in compliance with the License.
+-- You may obtain a copy of the License at
+--
+-- https://www.apache.org/licenses/LICENSE-2.0
+--
+-- Unless required by applicable law or agreed to in writing, software
+-- distributed under the License is distributed on an "AS IS" BASIS,
+-- WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+-- See the License for the specific language governing permissions and
+-- limitations under the License.
+
+-- Urban Planner Query 3: Traffic Monitoring Density
+-- Business Question: Which geographic areas show the highest concentration of RMI route monitoring?
+-- Use Case: Helps planners identify "blind spots" in their monitoring network or confirm that critical urban zones are sufficiently covered by RMI probes.
+-- Product Stage: GA
+-- Estimated Bytes Processed: ~150 MB
+
+/*
+ ANALYTICAL PATTERN: Spatial Grid Aggregation
+ This query maps the RMI monitoring footprint by calculating route centroids
+ and grouping them into a ~1.1km grid (3 decimal places). This provides a
+ coarse-grained view of network density without high computational overhead.
+*/
+
+WITH route_centroids AS (
+ SELECT
+ selected_route_id,
+ -- Use the centroid to represent the general location of the route polyline
+ ST_CENTROID(route_geometry) as centroid
+ FROM `boston_oct_2025_sample_data.historical_travel_time`
+ WHERE record_time BETWEEN '2025-10-01' AND '2025-11-01'
+)
+SELECT
+ -- Grid the coordinates to a precision of ~1.1km
+ ROUND(ST_Y(centroid), 3) AS lat_grid,
+ ROUND(ST_X(centroid), 3) AS lon_grid,
+ -- Count unique route definitions in this grid cell
+ COUNT(DISTINCT selected_route_id) AS unique_routes_monitored,
+ -- Count total traffic samples captured in this grid cell
+ COUNT(*) AS total_samples_collected
+FROM route_centroids
+GROUP BY 1, 2
+ORDER BY unique_routes_monitored DESC
+LIMIT 20;
diff --git a/roads_management_insights/rmi_sample_queries/queries/urban_planner/up4_weekend_vs_weekday.sql b/roads_management_insights/rmi_sample_queries/queries/urban_planner/up4_weekend_vs_weekday.sql
new file mode 100644
index 0000000..2c6e4e8
--- /dev/null
+++ b/roads_management_insights/rmi_sample_queries/queries/urban_planner/up4_weekend_vs_weekday.sql
@@ -0,0 +1,50 @@
+-- Copyright 2026 Google LLC
+--
+-- Licensed under the Apache License, Version 2.0 (the "License");
+-- you may not use this file except in compliance with the License.
+-- You may obtain a copy of the License at
+--
+-- https://www.apache.org/licenses/LICENSE-2.0
+--
+-- Unless required by applicable law or agreed to in writing, software
+-- distributed under the License is distributed on an "AS IS" BASIS,
+-- WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+-- See the License for the specific language governing permissions and
+-- limitations under the License.
+
+-- Urban Planner Query 4: Weekend vs. Weekday Trends
+-- Business Question: How does average travel time in the afternoon (2-5 PM) differ between weekdays and weekends?
+-- Use Case: Informs urban policy decisions like congestion pricing or off-peak transit scheduling by highlighting when road demand is most elastic.
+-- Product Stage: GA
+-- Estimated Bytes Processed: ~150 MB
+
+/*
+ ANALYTICAL PATTERN: Day-Type Segmentation
+ This query uses EXTRACT(DAYOFWEEK...) to categorize records into binary
+ 'Weekday' or 'Weekend' buckets. It combines this with a peak-window filter
+ to provide a clean comparison of temporal demand shifts.
+*/
+
+WITH afternoon_stats AS (
+ SELECT
+ -- Day segmentation: 1 = Sunday, 7 = Saturday
+ CASE
+ WHEN EXTRACT(DAYOFWEEK FROM DATETIME(record_time, 'America/New_York')) IN (1, 7) THEN 'Weekend'
+ ELSE 'Weekday'
+ END AS day_type,
+ duration_in_seconds,
+ static_duration_in_seconds
+ FROM `boston_oct_2025_sample_data.historical_travel_time`
+ WHERE record_time BETWEEN '2025-10-01' AND '2025-11-01'
+ -- Afternoon period: 2 PM to 5 PM Local Time (Boston)
+ AND EXTRACT(HOUR FROM DATETIME(record_time, 'America/New_York')) BETWEEN 14 AND 17
+)
+SELECT
+ day_type,
+ -- Calculate Average Delay Index (Actual / Ideal)
+ ROUND(AVG(SAFE_DIVIDE(duration_in_seconds, static_duration_in_seconds)), 3) AS avg_delay_ratio,
+ ROUND(AVG(duration_in_seconds), 2) AS avg_duration_seconds,
+ COUNT(*) as sample_count
+FROM afternoon_stats
+GROUP BY 1
+ORDER BY avg_delay_ratio DESC;
diff --git a/roads_management_insights/rmi_sample_queries/queries/urban_planner/up5_geofenced_congestion.sql b/roads_management_insights/rmi_sample_queries/queries/urban_planner/up5_geofenced_congestion.sql
new file mode 100644
index 0000000..78c8b1d
--- /dev/null
+++ b/roads_management_insights/rmi_sample_queries/queries/urban_planner/up5_geofenced_congestion.sql
@@ -0,0 +1,52 @@
+-- Copyright 2026 Google LLC
+--
+-- Licensed under the Apache License, Version 2.0 (the "License");
+-- you may not use this file except in compliance with the License.
+-- You may obtain a copy of the License at
+--
+-- https://www.apache.org/licenses/LICENSE-2.0
+--
+-- Unless required by applicable law or agreed to in writing, software
+-- distributed under the License is distributed on an "AS IS" BASIS,
+-- WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+-- See the License for the specific language governing permissions and
+-- limitations under the License.
+
+-- Urban Planner Query 5: Geofenced Congestion
+-- Business Question: Within a specific downtown polygon, which routes are currently seeing travel times more than 50% above their static baseline?
+-- Use Case: Enables targeted traffic management and demand-response strategies within high-density zones or during special events.
+-- Product Stage: GA
+-- Estimated Bytes Processed: ~150 MB
+
+/*
+ ANALYTICAL PATTERN: Spatial Geofencing
+ This query uses a DECLARE statement for the downtown polygon to ensure
+ BigQuery treats the study area as a constant, enabling efficient spatial
+ indexing during the ST_INTERSECTS join. It identifies routes that are
+ physically impacted by a specific urban zone.
+*/
+
+-- Study Area: Downtown Boston Geofence
+DECLARE downtown_zone GEOGRAPHY DEFAULT ST_GEOGFROMTEXT('POLYGON((-71.066 42.358, -71.052 42.358, -71.052 42.348, -71.066 42.348, -71.066 42.358))');
+
+WITH intersecting_routes AS (
+ SELECT
+ h.selected_route_id,
+ h.display_name,
+ SAFE_DIVIDE(h.duration_in_seconds, h.static_duration_in_seconds) AS delay_ratio
+ FROM `boston_oct_2025_sample_data.historical_travel_time` h
+ WHERE ST_INTERSECTS(h.route_geometry, downtown_zone)
+ AND h.record_time BETWEEN '2025-10-01' AND '2025-11-01'
+ -- Quality filter: Exclude non-continuous geometries
+ AND ST_GEOMETRYTYPE(h.route_geometry) = 'ST_LineString'
+)
+SELECT
+ selected_route_id,
+ display_name,
+ ROUND(AVG(delay_ratio), 3) AS avg_delay_ratio,
+ COUNT(*) as sample_count
+FROM intersecting_routes
+GROUP BY 1, 2
+-- Threshold: Filter for routes that are at least 1.5x slower than free-flow
+HAVING avg_delay_ratio > 1.5
+ORDER BY avg_delay_ratio DESC;