diff --git a/places_insights/notebooks/analyze_site_performance/README.md b/places_insights/notebooks/analyze_site_performance/README.md
new file mode 100644
index 0000000..13e82c7
--- /dev/null
+++ b/places_insights/notebooks/analyze_site_performance/README.md
@@ -0,0 +1,25 @@
+# Analyze Site Performance with Places Insights and BigQuery ML
+
+This directory contains a complete Geospatial Machine Learning workflow demonstrating how to combine internal operational metrics with external environmental data to diagnose the location factors that drive site success.
+
+By leveraging **Places Insights**, **BigQuery ML**, and **H3 Spatial Indexing**, this sample shows how to move beyond anecdotal explanations and quantify exactly how local competitive density and neighborhood characteristics dictate performance.
+
+## Directory Contents
+
+* **`places_insights_analyze_site_performance_bigquery_ml.ipynb`**: The primary interactive workflow. It demonstrates how to ingest site data, engineer features using Spatial Joins (`ST_DWITHIN`) against the Places Insights dataset, train a Robust Linear Regression model in BigQuery ML, and visualize city-wide expansion opportunities using an interactive H3 grid map.
+* **`places_insights_analyze_site_performance_data_generation.ipynb`**: An optional supplementary notebook. It demonstrates how to dynamically generate a realistic, synthetic training dataset of store locations in London by scoring geographic points based on their proximity to real-world amenities.
+* **`store_performance_london.csv`**: The static, pre-generated dataset created by the data generation notebook. This allows users to run the main BigQuery ML workflow immediately without needing to generate their own data.
+
+## Getting Started
+
+### Prerequisites
+
+To execute these notebooks, you will need:
+1. **Google Cloud Project**: With billing enabled and BigQuery active.
+2. **Places Insights Access**: Your project must be subscribed to the [GB Places Insights dataset](https://developers.google.com/maps/documentation/placesinsights/cloud-setup) in BigQuery.
+3. **Google Maps Platform API Key**: Required to render the interactive map visualizations. You must enable the **Maps JavaScript API** and **Maps Tiles API** on this key.
+
+### Execution Order
+
+1. *(Optional)* Run `places_insights_analyze_site_performance_data_generation.ipynb` to see how the synthetic correlation between performance and amenities is mathematically generated.
+2. Run `places_insights_analyze_site_performance_bigquery_ml.ipynb`. The notebook automatically fetches the provided `store_performance_london.csv` dataset directly from GitHub to proceed with the BigQuery ML training and prospecting visualization. *(Note: If you ran the optional data generation step, you can modify the notebook to ingest your custom generated file instead).*
\ No newline at end of file
diff --git a/places_insights/notebooks/analyze_site_performance/places_insights_analyze_site_performance_bigquery_ml.ipynb b/places_insights/notebooks/analyze_site_performance/places_insights_analyze_site_performance_bigquery_ml.ipynb
new file mode 100644
index 0000000..c99bea4
--- /dev/null
+++ b/places_insights/notebooks/analyze_site_performance/places_insights_analyze_site_performance_bigquery_ml.ipynb
@@ -0,0 +1,731 @@
+{
+ "nbformat": 4,
+ "nbformat_minor": 0,
+ "metadata": {
+ "colab": {
+ "provenance": [],
+ "private_outputs": true
+ },
+ "kernelspec": {
+ "name": "python3",
+ "display_name": "Python 3"
+ },
+ "language_info": {
+ "name": "python"
+ }
+ },
+ "cells": [
+ {
+ "cell_type": "code",
+ "source": [
+ "# Copyright 2026 Google LLC\n",
+ "#\n",
+ "# Licensed under the Apache License, Version 2.0 (the \"License\");\n",
+ "# you may not use this file except in compliance with the License.\n",
+ "# You may obtain a copy of the License at\n",
+ "#\n",
+ "# https://www.apache.org/licenses/LICENSE-2.0\n",
+ "#\n",
+ "# Unless required by applicable law or agreed to in writing, software\n",
+ "# distributed under the License is distributed on an \"AS IS\" BASIS,\n",
+ "# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n",
+ "# See the License for the specific language governing permissions and\n",
+ "# limitations under the License."
+ ],
+ "metadata": {
+ "id": "pDKqIF_IzOI6"
+ },
+ "execution_count": null,
+ "outputs": []
+ },
+ {
+ "cell_type": "markdown",
+ "source": [
+ "# π Analyze Site Performance with Places Insights and BigQuery ML\n",
+ "\n",
+ "
\n",
+ " \n",
+ " \n",
+ "  Open in Colab\n",
+ " \n",
+ " | \n",
+ " \n",
+ " \n",
+ "  Open in Colab Enterprise\n",
+ " \n",
+ " | \n",
+ " \n",
+ " \n",
+ "  Open in BigQuery Studio\n",
+ " \n",
+ " | \n",
+ " \n",
+ " \n",
+ "  View on GitHub\n",
+ " \n",
+ " | \n",
+ "
"
+ ],
+ "metadata": {
+ "id": "qcyzyPNI-YQ3"
+ }
+ },
+ {
+ "cell_type": "markdown",
+ "source": [
+ "### **Overview**\n",
+ "\n",
+ "This notebook demonstrates a Geospatial Machine Learning workflow. We will combine internal operational metrics (synthetic store performance) with external environmental data (**Places Insights**) to diagnose the location factors that drive success.\n",
+ "\n",
+ "By the end of this session, you will have a trained **Robust Linear Regression** model and an interactive **Prospecting Map** that scores every neighborhood in London based on its amenity profile.\n",
+ "\n",
+ "### **Key Technologies**\n",
+ "\n",
+ "* **[Google Places Insights](https://developers.google.com/maps/documentation/placesinsights):** A BigQuery-native dataset providing aggregated counts of Places (POIs) without needing to query an API.\n",
+ "* **[BigQuery ML](https://cloud.google.com/bigquery/docs/bqml-introduction):** Allows us to create, train, and deploy the machine learning model directly using standard SQL.\n",
+ "* **[H3 Spatial Indexing](https://h3geo.org/):** We use H3 to divide the city into uniform cells for consistent scoring and visualization.\n",
+ "* **[IPython Magics](https://googleapis.dev/python/bigquery-magics/latest/):** We use `%%bigquery` to write SQL directly in Colab cells.\n",
+ "\n",
+ "### **The Workflow**\n",
+ "\n",
+ "1. **Data Ingestion:** We upload a synthetic dataset of 400 - 500 stores across **London** with varying performance scores.\n",
+ "2. **Feature Engineering:** We use **Spatial Joins** (`ST_DWITHIN`) to count amenities (Gyms, Schools, Transit, etc.) within a 500m radius of every store.\n",
+ "3. **Model Training:** We train a **Robust Linear Regression** model (`ML.ROBUST_SCALER`) to predict performance while handling geospatial outliers.\n",
+ "4. **Evaluation:** We assess model accuracy using RΒ² and Mean Absolute Error (MAE) on a holdout test set.\n",
+ "5. **City-Wide Prospecting:** Instead of scoring a single site, we apply the model to the **entire London H3 Grid** (Resolution 8) to visualize performance hotspots across the city.\n",
+ "6. **Clean Up:** We provide a final step to delete the dataset and all created tables/models from your Google Cloud project.\n",
+ "\n",
+ "### **Prerequisites & Setup**\n",
+ "\n",
+ "* **Google Cloud Project:** You need a project with BigQuery enabled.\n",
+ "* **Places Insights Access:** Your project must be subscribed to the [GB Places Insights dataset](https://developers.google.com/maps/documentation/placesinsights/cloud-setup) in BigQuery sharing.\n",
+ "* **Google Maps Platform [API Key](https://developers.google.com/maps/get-started):** Required to render the final interactive map visualization. Enable the [**Maps JavaScript API**](https://developers.google.com/maps/documentation/javascript/get-api-key?setupProd=enable#enable-the-api) and [**Maps Tiles API**](https://developers.google.com/maps/documentation/tile/get-api-key?setupProd=enable#enable-the-api) on this key.\n",
+ "* **Colab Secrets:** Please add the following to the **Secrets** tab (Key icon on the left):\n",
+ " * `GCP_PROJECT_ID`: Your Google Cloud Project ID.\n",
+ " * `GMP_API_KEY`: The Google Maps API Key you configured in the previous step."
+ ],
+ "metadata": {
+ "id": "xqu18lHqJxM4"
+ }
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "id": "iBQ867lfDZRf",
+ "cellView": "form"
+ },
+ "outputs": [],
+ "source": [
+ "# @title 1a. Setup & Authentication\n",
+ "# @markdown Authenticate to Google Cloud, retrieve secrets, and initialize the BigQuery client.\n",
+ "# @markdown\n",
+ "# @markdown This cell creates a BigQuery dataset to use during this notebook, use this input to select the region (Default: US).\n",
+ "\n",
+ "import sys\n",
+ "import random\n",
+ "import pandas as pd\n",
+ "import seaborn as sns\n",
+ "import pandas_gbq\n",
+ "from google.colab import auth, userdata\n",
+ "from google.cloud import bigquery\n",
+ "import requests\n",
+ "import geopandas as gpd\n",
+ "import folium\n",
+ "\n",
+ "# 1. Retrieve Secrets\n",
+ "GCP_PROJECT_ID = userdata.get('GCP_PROJECT_ID').strip()\n",
+ "print(f\"β
Secrets retrieved for project: {GCP_PROJECT_ID}\")\n",
+ "GMP_API_KEY = userdata.get('GMP_API_KEY').strip()\n",
+ "print(f\"β
GMP API Key retrieved.\")\n",
+ "\n",
+ "# 2. Authenticate User\n",
+ "auth.authenticate_user(project_id=GCP_PROJECT_ID)\n",
+ "print(\"β
User Authenticated.\")\n",
+ "\n",
+ "# 3. Global Configuration\n",
+ "DATASET_ID = \"places_insights_site_perf_demo\"\n",
+ "REGION = \"US\" # @param {type:\"string\"}\n",
+ "STORES_TABLE = f\"{DATASET_ID}.store_performance\"\n",
+ "FEATURES_TABLE = f\"{DATASET_ID}.store_features\"\n",
+ "MODEL_NAME = f\"{DATASET_ID}.site_performance_model\"\n",
+ "\n",
+ "# 4. Initialize BigQuery Dataset\n",
+ "client = bigquery.Client(project=GCP_PROJECT_ID)\n",
+ "ds = bigquery.Dataset(f\"{GCP_PROJECT_ID}.{DATASET_ID}\")\n",
+ "ds.location = REGION\n",
+ "client.create_dataset(ds, exists_ok=True)\n",
+ "print(f\"β
Working dataset ready: {GCP_PROJECT_ID}.{DATASET_ID}\")"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "source": [
+ "# @title 1b. Maps backend Initialization: Session, Copyright & Assets\n",
+ "# @markdown This cell manages the Maps API handshake. It performs the following steps:\n",
+ "# @markdown 1. **Session Creation:** Authenticates and requests a \"Roadmap\" session for the target region.\n",
+ "# @markdown 2. **Attribution Fetching:** Queries the API for the copyright text required for the configured viewport.\n",
+ "# @markdown 3. **Asset Preparation:** Generates the HTML for the Google Maps logo overlay.\n",
+ "\n",
+ "# --- 1. Create Google Maps Session ---\n",
+ "print(\"πΊοΈ Initializing Google Maps Session...\")\n",
+ "session_url = f\"https://tile.googleapis.com/v1/createSession?key={GMP_API_KEY}\"\n",
+ "headers = {\"Content-Type\": \"application/json\"}\n",
+ "payload = {\n",
+ " \"mapType\": \"roadmap\",\n",
+ " \"language\": \"en-GB\",\n",
+ " \"region\": \"GB\"\n",
+ "}\n",
+ "\n",
+ "try:\n",
+ " response = requests.post(session_url, json=payload, headers=headers)\n",
+ " response.raise_for_status()\n",
+ " session_token = response.json().get(\"session\")\n",
+ " print(f\"β
Session Token acquired.\")\n",
+ "except Exception as e:\n",
+ " raise RuntimeError(f\"Failed to initialize Google Maps session: {e}\")\n",
+ "\n",
+ "# --- 2. Fetch Dynamic Attribution for London ---\n",
+ "# Center of our synthetic data area\n",
+ "LAT, LNG = 51.5074, -0.1278\n",
+ "ZOOM_LEVEL = 11\n",
+ "delta = 0.2\n",
+ "\n",
+ "viewport_url = (\n",
+ " f\"https://tile.googleapis.com/tile/v1/viewport?key={GMP_API_KEY}\"\n",
+ " f\"&session={session_token}\"\n",
+ " f\"&zoom={ZOOM_LEVEL}\"\n",
+ " f\"&north={LAT + delta}&south={LAT - delta}\"\n",
+ " f\"&west={LNG - delta}&east={LNG + delta}\"\n",
+ ")\n",
+ "\n",
+ "try:\n",
+ " vp_response = requests.get(viewport_url)\n",
+ " vp_response.raise_for_status()\n",
+ " google_attribution = vp_response.json().get('copyright', 'Map data Β© Google')\n",
+ " print(\"β
Attribution fetched.\")\n",
+ "except Exception as e:\n",
+ " print(f\"β οΈ Warning: Could not fetch attribution ({e}). Defaulting.\")\n",
+ " google_attribution = \"Map data Β© Google\"\n",
+ "\n",
+ "# --- 3. Construct Logo HTML ---\n",
+ "logo_url = \"https://maps.gstatic.com/mapfiles/api-3/images/google_white3.png\"\n",
+ "logo_html = f\"\"\"\n",
+ " \n",
+ "

\n",
+ "
\n",
+ "\"\"\"\n",
+ "print(\"β
Logo HTML prepared.\")"
+ ],
+ "metadata": {
+ "cellView": "form",
+ "id": "mOPQnlbFHk3C"
+ },
+ "execution_count": null,
+ "outputs": []
+ },
+ {
+ "cell_type": "markdown",
+ "source": [
+ "### 2. Import Data\n",
+ "\n",
+ "In this step, we import the pre-generated dataset representing **store locations in London**.\n",
+ "\n",
+ "This dataset contains:\n",
+ "* `store_id`: Unique identifier.\n",
+ "* `store_performance`: The synthetic performance score (0-100).\n",
+ "* `geometry`: The geospatial location (Point).\n",
+ "\n",
+ "We will upload the CSV locally and persist it to BigQuery to serve as the foundation for our model training."
+ ],
+ "metadata": {
+ "id": "qp-17vZ_F6-3"
+ }
+ },
+ {
+ "cell_type": "code",
+ "source": [
+ "# @title 2a. Fetch Synthetic Data from GitHub\n",
+ "# @markdown We load the pre-generated `store_performance_london.csv` file directly from the public GitHub repository.\n",
+ "# @markdown\n",
+ "# @markdown *Curious how this synthetic dataset was created? Check out the [Data Generation Notebook](https://github.com/googlemaps-samples/insights-samples/blob/main/places_insights/notebooks/analyze_site_performance/places_insights_analyze_site_performance_data_generation.ipynb).*\n",
+ "import pandas as pd\n",
+ "\n",
+ "github_url = \"https://raw.githubusercontent.com/googlemaps-samples/insights-samples/main/places_insights/notebooks/analyze_site_performance/store_performance_london.csv\"\n",
+ "\n",
+ "print(\"β¬οΈ Fetching data from GitHub...\")\n",
+ "\n",
+ "# Read the CSV directly from the URL into a DataFrame\n",
+ "df_input = pd.read_csv(github_url)\n",
+ "\n",
+ "print(f\"β
Successfully loaded {len(df_input)} rows.\")\n",
+ "display(df_input.head())"
+ ],
+ "metadata": {
+ "id": "Labi1lGgETuG"
+ },
+ "execution_count": null,
+ "outputs": []
+ },
+ {
+ "cell_type": "code",
+ "source": [
+ "# @title 2b. Load Data to BigQuery\n",
+ "# @markdown We upload the DataFrame to the `STORES_TABLE` in BigQuery, casting the geometry column correctly.\n",
+ "\n",
+ "# 1. Define Schema to ensure 'location' is parsed as GEOGRAPHY (not String)\n",
+ "table_schema = [\n",
+ " {'name': 'store_id', 'type': 'STRING'},\n",
+ " {'name': 'store_performance', 'type': 'FLOAT'},\n",
+ " {'name': 'location', 'type': 'GEOGRAPHY'}, # Critical: Casts WKT string to GEOGRAPHY\n",
+ "]\n",
+ "\n",
+ "# 2. Upload to BigQuery\n",
+ "print(f\"βοΈ Uploading data to `{STORES_TABLE}`...\")\n",
+ "\n",
+ "pandas_gbq.to_gbq(\n",
+ " dataframe=df_input,\n",
+ " destination_table=STORES_TABLE,\n",
+ " project_id=GCP_PROJECT_ID,\n",
+ " if_exists='replace',\n",
+ " table_schema=table_schema,\n",
+ " location=REGION\n",
+ ")\n",
+ "\n",
+ "print(f\"β
Successfully loaded data to `{STORES_TABLE}`.\")"
+ ],
+ "metadata": {
+ "id": "Ya668gzclbPB"
+ },
+ "execution_count": null,
+ "outputs": []
+ },
+ {
+ "cell_type": "code",
+ "source": [
+ "# @title 3. Feature Engineering (Spatial Join)\n",
+ "# @markdown We now bridge the gap between internal performance data and the external world using a **Spatial Join**.\n",
+ "# @markdown\n",
+ "# @markdown **The Logic:**\n",
+ "# @markdown 1. **`ST_DWITHIN`:** For every store in our database, we look for Places within a **500-meter radius**.\n",
+ "# @markdown 2. **`COUNTIF`:** We calculate density vectors (e.g., \"How many gyms are nearby?\") to serve as our model features ($X$).\n",
+ "# @markdown 3. **Output:** The result is downloaded to the Python variable `df_features`.\n",
+ "\n",
+ "%%bigquery df_features --project $GCP_PROJECT_ID --location $REGION\n",
+ "\n",
+ "SELECT WITH AGGREGATION_THRESHOLD\n",
+ " internal.store_id,\n",
+ " internal.store_performance,\n",
+ "\n",
+ " -- Feature Engineering: count nearby POIs by type\n",
+ " COUNTIF('gym' IN UNNEST(places.types)) AS gym_count,\n",
+ " COUNTIF('restaurant' IN UNNEST(places.types)) AS restaurant_count,\n",
+ " COUNTIF('school' IN UNNEST(places.types)) AS school_count,\n",
+ " COUNTIF('transit_station' IN UNNEST(places.types)) AS transit_count,\n",
+ " COUNTIF('clothing_store' IN UNNEST(places.types)) AS clothing_store_count\n",
+ "\n",
+ "FROM\n",
+ " `places_insights_site_perf_demo.store_performance` AS internal\n",
+ "JOIN\n",
+ " `places_insights___gb.places` AS places\n",
+ " ON ST_DWITHIN(internal.location, places.point, 500) -- 500m Radius\n",
+ "WHERE\n",
+ " places.business_status = 'OPERATIONAL'\n",
+ "GROUP BY\n",
+ " internal.store_id, internal.store_performance"
+ ],
+ "metadata": {
+ "id": "J73jjZp8G7zh"
+ },
+ "execution_count": null,
+ "outputs": []
+ },
+ {
+ "cell_type": "code",
+ "source": [
+ "# @markdown Save the engineered features back to a permanent BQ table for training.\n",
+ "\n",
+ "pandas_gbq.to_gbq( # type: ignore\n",
+ " dataframe=df_features, # type: ignore\n",
+ " destination_table=FEATURES_TABLE,\n",
+ " project_id=GCP_PROJECT_ID,\n",
+ " if_exists='replace',\n",
+ " location=REGION\n",
+ ")\n",
+ "\n",
+ "print(f\"β
Training data saved to `{FEATURES_TABLE}`\")\n",
+ "display(df_features.head()) # type: ignore"
+ ],
+ "metadata": {
+ "id": "SJqBUR3PHAhh"
+ },
+ "execution_count": null,
+ "outputs": []
+ },
+ {
+ "cell_type": "code",
+ "source": [
+ "# @title **Exploratory Data Analysis: Feature Correlations**\n",
+ "# @markdown We use a **Pairplot** to visualize how each feature interacts with the target variable (`store_performance`).\n",
+ "# @markdown\n",
+ "# @markdown **Key Observations:**\n",
+ "# @markdown * **Linearity:** Look at the top row. You can see a clear positive linear trend between features like `restaurant_count` and `store_performance`. This confirms that a **Linear Regression** model is the right choice for this data.\n",
+ "# @markdown * **Distributions:** The diagonal histograms show that our amenity counts are \"right-skewed\" (mostly low numbers with a few high-density hubs), which is typical for geospatial data.\n",
+ "\n",
+ "import matplotlib.pyplot as plt\n",
+ "\n",
+ "input_features = ['store_performance', 'gym_count', 'restaurant_count', 'school_count', 'transit_count', 'clothing_store_count']\n",
+ "g = sns.pairplot(df_features[input_features], plot_kws={\"s\": 3, 'alpha': 0.6}, diag_kws={'color': 'crimson'}, height=1.8) # type: ignore\n",
+ "g.set(xlim=(0, 100), ylim=(0, 100))\n",
+ "\n",
+ "plt.show()"
+ ],
+ "metadata": {
+ "id": "W9bWzCWcQ5Nm"
+ },
+ "execution_count": null,
+ "outputs": []
+ },
+ {
+ "cell_type": "code",
+ "source": [
+ "# @title 4. Train the Linear Regression Model\n",
+ "# @markdown We now train a **Linear Regression** model to predict store performance.\n",
+ "# @markdown\n",
+ "# @markdown **Key Model Configuration:**\n",
+ "# @markdown * **`ML.ROBUST_SCALER`:** We use this within the `TRANSFORM` clause. Unlike standard scaling (Mean/StdDev), robust scaling uses the **Median** and **IQR**. This is critical for geospatial data, where a single location with 500 restaurants (an outlier) could otherwise skew the entire model.\n",
+ "# @markdown * **`AUTO_SPLIT`:** We let BigQuery automatically reserve ~20% of the data for evaluation. This makes sure we test the model on data it has never seen before.\n",
+ "# @markdown * **`NORMAL_EQUATION`:** Since our dataset is small, we use the exact mathematical solution rather than an iterative approximation.\n",
+ "# @markdown * **Outlier Removal:** We filter out stores with `performance > 75` to focus the model on predicting the mechanics of \"typical\" or \"developing\" sites, rather than established outliers.\n",
+ "\n",
+ "%%bigquery --project $GCP_PROJECT_ID --location $REGION\n",
+ "\n",
+ "CREATE OR REPLACE MODEL `places_insights_site_perf_demo.site_performance_model`\n",
+ "TRANSFORM(\n",
+ " store_performance,\n",
+ " -- Feature Engineering inside the model artifact\n",
+ " -- These stats are calculated on the TRAINING split only\n",
+ " ML.ROBUST_SCALER(gym_count) OVER() AS scaled_gym_count,\n",
+ " ML.ROBUST_SCALER(restaurant_count) OVER() AS scaled_restaurant_count,\n",
+ " ML.ROBUST_SCALER(school_count) OVER() AS scaled_school_count,\n",
+ " ML.ROBUST_SCALER(transit_count) OVER() AS scaled_transit_count,\n",
+ " ML.ROBUST_SCALER(clothing_store_count) OVER() AS scaled_clothing_store_count\n",
+ ")\n",
+ "OPTIONS(\n",
+ " model_type = 'LINEAR_REG',\n",
+ " input_label_cols = ['store_performance'],\n",
+ "\n",
+ " -- OPTIMIZATION PARAMETERS\n",
+ " optimize_strategy = 'NORMAL_EQUATION', -- Exact mathematical solution (fast for small data)\n",
+ " data_split_method = 'AUTO_SPLIT', -- Automatically reserves ~20% for evaluation\n",
+ "\n",
+ " -- DIAGNOSTICS\n",
+ " enable_global_explain = TRUE -- Essential to see feature importance\n",
+ ")\n",
+ "AS\n",
+ "SELECT\n",
+ " gym_count,\n",
+ " restaurant_count,\n",
+ " school_count,\n",
+ " transit_count,\n",
+ " clothing_store_count,\n",
+ " store_performance\n",
+ "FROM\n",
+ " `places_insights_site_perf_demo.store_features`\n",
+ "WHERE\n",
+ " store_performance < 75"
+ ],
+ "metadata": {
+ "id": "F1fk_m2XHMgz"
+ },
+ "execution_count": null,
+ "outputs": []
+ },
+ {
+ "cell_type": "code",
+ "source": [
+ "# @title 5. Evaluate model performance\n",
+ "# @markdown We use `ML.EVALUATE` to test the model against the unseen \"Holdout\" data (the ~20% reserved automatically during training).\n",
+ "# @markdown The results (MAE, R2, etc.) are downloaded to the `df_eval` DataFrame for inspection in the next step.\n",
+ "\n",
+ "%%bigquery df_eval --project $GCP_PROJECT_ID --location $REGION\n",
+ "\n",
+ "SELECT *\n",
+ "FROM ML.EVALUATE(MODEL `places_insights_site_perf_demo.site_performance_model`)"
+ ],
+ "metadata": {
+ "id": "v_M7MRmsHQuD"
+ },
+ "execution_count": null,
+ "outputs": []
+ },
+ {
+ "cell_type": "code",
+ "source": [
+ "# @markdown ### **Interpretation of Results**\n",
+ "# @markdown * **R2 Score:** Measures how well the amenities explain the performance. A score close to **1.0** indicates a perfect fit. Since our data is synthetic and linear, we expect a very high score here (> 0.9).\n",
+ "# @markdown * **Mean Absolute Error (MAE):** The average \"miss\" in points. For example, an MAE of **1.5** means the model's prediction is typically within +/- 1.5 points of the actual score.\n",
+ "\n",
+ "print(f\"R2 Score: {df_eval['r2_score'][0]:.4f}\") # type: ignore\n",
+ "print(f\"Mean Absolute Error: {df_eval['mean_absolute_error'][0]:.4f}\") # type: ignore\n",
+ "display(df_eval) # type: ignore"
+ ],
+ "metadata": {
+ "id": "GOWw427HHTdy"
+ },
+ "execution_count": null,
+ "outputs": []
+ },
+ {
+ "cell_type": "code",
+ "source": [
+ "# @title 6. Score London by H3 Cell (Native Places Insights)\n",
+ "# @markdown We now apply our trained model to London using the native Places Insights H3 function.\n",
+ "# @markdown\n",
+ "# @markdown **The Approach:**\n",
+ "# @markdown 1. **H3 Indexing & Counting:** We use `PLACES_COUNT_PER_H3` to get pre-aggregated counts of amenities within a 25km radius of central London.\n",
+ "# @markdown 2. **Pivoting:** Because the function returns one row per amenity type, we use `UNION ALL` and group the results to create the feature columns (`gym_count`, `restaurant_count`, etc.).\n",
+ "# @markdown 3. **Batch Prediction:** We feed these \"Grid Features\" into `ML.PREDICT` to generate a `predicted_store_performance` score for every cell.\n",
+ "\n",
+ "%%bigquery df_h3_predictions --project $GCP_PROJECT_ID --location $REGION\n",
+ "\n",
+ "WITH combined_counts AS (\n",
+ " -- Gyms\n",
+ " SELECT h3_cell_index, geography, count, 'gym' AS type\n",
+ " FROM `places_insights___gb.PLACES_COUNT_PER_H3`(\n",
+ " JSON_OBJECT(\n",
+ " 'geography', ST_BUFFER(ST_GEOGPOINT(-0.1278, 51.5074), 25000), -- 25km radius around London\n",
+ " 'h3_resolution', 8,\n",
+ " 'business_status', ['OPERATIONAL'],\n",
+ " 'types', ['gym']\n",
+ " )\n",
+ " )\n",
+ " UNION ALL\n",
+ " -- Restaurants\n",
+ " SELECT h3_cell_index, geography, count, 'restaurant' AS type\n",
+ " FROM `places_insights___gb.PLACES_COUNT_PER_H3`(\n",
+ " JSON_OBJECT(\n",
+ " 'geography', ST_BUFFER(ST_GEOGPOINT(-0.1278, 51.5074), 25000),\n",
+ " 'h3_resolution', 8,\n",
+ " 'business_status', ['OPERATIONAL'],\n",
+ " 'types', ['restaurant']\n",
+ " )\n",
+ " )\n",
+ " UNION ALL\n",
+ " -- Schools\n",
+ " SELECT h3_cell_index, geography, count, 'school' AS type\n",
+ " FROM `places_insights___gb.PLACES_COUNT_PER_H3`(\n",
+ " JSON_OBJECT(\n",
+ " 'geography', ST_BUFFER(ST_GEOGPOINT(-0.1278, 51.5074), 25000),\n",
+ " 'h3_resolution', 8,\n",
+ " 'business_status', ['OPERATIONAL'],\n",
+ " 'types', ['school']\n",
+ " )\n",
+ " )\n",
+ " UNION ALL\n",
+ " -- Transit Stations\n",
+ " SELECT h3_cell_index, geography, count, 'transit_station' AS type\n",
+ " FROM `places_insights___gb.PLACES_COUNT_PER_H3`(\n",
+ " JSON_OBJECT(\n",
+ " 'geography', ST_BUFFER(ST_GEOGPOINT(-0.1278, 51.5074), 25000),\n",
+ " 'h3_resolution', 8,\n",
+ " 'business_status', ['OPERATIONAL'],\n",
+ " 'types', ['transit_station']\n",
+ " )\n",
+ " )\n",
+ " UNION ALL\n",
+ " -- Clothing Stores\n",
+ " SELECT h3_cell_index, geography, count, 'clothing_store' AS type\n",
+ " FROM `places_insights___gb.PLACES_COUNT_PER_H3`(\n",
+ " JSON_OBJECT(\n",
+ " 'geography', ST_BUFFER(ST_GEOGPOINT(-0.1278, 51.5074), 25000),\n",
+ " 'h3_resolution', 8,\n",
+ " 'business_status', ['OPERATIONAL'],\n",
+ " 'types', ['clothing_store']\n",
+ " )\n",
+ " )\n",
+ "),\n",
+ "aggregated_features AS (\n",
+ " -- Pivot the stacked rows back into standard feature columns for the ML Model\n",
+ " SELECT\n",
+ " h3_cell_index AS h3_index,\n",
+ " ANY_VALUE(geography) AS h3_geography,\n",
+ " SUM(IF(type = 'gym', count, 0)) AS gym_count,\n",
+ " SUM(IF(type = 'restaurant', count, 0)) AS restaurant_count,\n",
+ " SUM(IF(type = 'school', count, 0)) AS school_count,\n",
+ " SUM(IF(type = 'transit_station', count, 0)) AS transit_count,\n",
+ " SUM(IF(type = 'clothing_store', count, 0)) AS clothing_store_count\n",
+ " FROM\n",
+ " combined_counts\n",
+ " GROUP BY\n",
+ " h3_cell_index\n",
+ ")\n",
+ "\n",
+ "-- Feed the pivoted features into the model\n",
+ "SELECT\n",
+ " h3_index,\n",
+ " predicted_store_performance,\n",
+ " h3_geography,\n",
+ " gym_count,\n",
+ " restaurant_count\n",
+ "FROM\n",
+ " ML.PREDICT(MODEL `places_insights_site_perf_demo.site_performance_model`,\n",
+ " (SELECT * FROM aggregated_features)\n",
+ " )\n",
+ "ORDER BY\n",
+ " predicted_store_performance DESC;"
+ ],
+ "metadata": {
+ "id": "UssuC8508R0L"
+ },
+ "execution_count": null,
+ "outputs": []
+ },
+ {
+ "cell_type": "code",
+ "source": [
+ "# @title 7. Display H3 Prospecting Map\n",
+ "# @markdown We render the H3 grid as a choropleth layer.\n",
+ "# @markdown * **Yellow Areas:** High predicted performance (Hotspots).\n",
+ "# @markdown * **Purple Areas:** Low predicted performance (Coldspots).\n",
+ "# @markdown * **Interactive:** Hover over any cell to see the underlying amenity counts driving the score.\n",
+ "\n",
+ "import geopandas as gpd\n",
+ "from folium import Element\n",
+ "from shapely import wkt\n",
+ "\n",
+ "# --- 1. Prepare Data ---\n",
+ "# Explicitly convert the WKT strings from BigQuery into Shapely Geometry objects\n",
+ "if isinstance(df_h3_predictions['h3_geography'].iloc[0], str):\n",
+ " df_h3_predictions['h3_geography'] = df_h3_predictions['h3_geography'].apply(wkt.loads)\n",
+ "\n",
+ "# Create GeoDataFrame\n",
+ "gdf_h3 = gpd.GeoDataFrame(df_h3_predictions, geometry='h3_geography')\n",
+ "\n",
+ "# --- 2. Construct Tiles URL ---\n",
+ "tiles_url = f\"https://tile.googleapis.com/v1/2dtiles/{{z}}/{{x}}/{{y}}?session={session_token}&key={GMP_API_KEY}\"\n",
+ "\n",
+ "# --- 3. Initialize Map ---\n",
+ "m = folium.Map(\n",
+ " location=[51.5074, -0.1278],\n",
+ " zoom_start=11,\n",
+ " tiles=tiles_url,\n",
+ " attr=google_attribution,\n",
+ " name=\"Google Maps\",\n",
+ " control_scale=True,\n",
+ " prefer_canvas=True\n",
+ ")\n",
+ "\n",
+ "# --- 4. Add Google Logo (Bottom Left) ---\n",
+ "m.get_root().html.add_child(Element(logo_html))\n",
+ "\n",
+ "# --- 5. Add Custom Legend (Bottom Right) ---\n",
+ "legend_html = \"\"\"\n",
+ "\n",
+ "
Predicted Score\n",
+ "
Teal -> Green -> Yellow */\n",
+ " background: linear-gradient(to right, #440154, #3b528b, #21918c, #5ec962, #fde725);\n",
+ " margin-top: 8px;\n",
+ " margin-bottom: 4px;\n",
+ " \">
\n",
+ "
\n",
+ " Low (~20)\n",
+ " High (~80)\n",
+ "
\n",
+ "
\n",
+ "\"\"\"\n",
+ "m.get_root().html.add_child(Element(legend_html))\n",
+ "\n",
+ "# --- 6. Overlay H3 Grid ---\n",
+ "gdf_h3.explore(\n",
+ " m=m,\n",
+ " column='predicted_store_performance',\n",
+ " cmap='viridis',\n",
+ " vmin=20,\n",
+ " vmax=80,\n",
+ " # Style Keywords for Polygons (remove borders for smoother look)\n",
+ " style_kwds={'stroke': False, 'fillOpacity': 0.6},\n",
+ " tooltip=[\n",
+ " 'h3_index',\n",
+ " 'predicted_store_performance',\n",
+ " 'gym_count',\n",
+ " 'restaurant_count'\n",
+ " # Note: 'transit_count' removed because it wasn't selected in the SQL query\n",
+ " ],\n",
+ " name=\"Prospecting Heatmap\"\n",
+ ")\n",
+ "\n",
+ "# Add layer control to toggle data on/off\n",
+ "folium.LayerControl().add_to(m)\n",
+ "\n",
+ "display(m)"
+ ],
+ "metadata": {
+ "cellView": "form",
+ "id": "FwVtreNaj_q8"
+ },
+ "execution_count": null,
+ "outputs": []
+ },
+ {
+ "cell_type": "code",
+ "source": [
+ "# @title 8. Clean Up Resources\n",
+ "# @markdown This cell deletes the demo dataset (`places_insights_site_perf_demo`) and all tables within it.\n",
+ "# @markdown\n",
+ "# @markdown **You will be prompted to confirm before deletion.**\n",
+ "\n",
+ "from google.cloud.exceptions import NotFound\n",
+ "\n",
+ "# Validation\n",
+ "print(f\"β οΈ WARNING: You are about to DELETE the dataset: `{DATASET_ID}`\")\n",
+ "print(f\" Project: `{GCP_PROJECT_ID}`\")\n",
+ "print(\" This action cannot be undone.\")\n",
+ "\n",
+ "# Interactive Input\n",
+ "confirmation = input(\"Type 'yes' to proceed with deletion: \").strip().lower()\n",
+ "\n",
+ "if confirmation == 'yes':\n",
+ " print(f\"\\nποΈ Deleting dataset: {DATASET_ID}...\")\n",
+ " try:\n",
+ " # delete_contents=True removes tables inside the dataset\n",
+ " # not_found_ok=True prevents errors if the dataset is already gone\n",
+ " client.delete_dataset(DATASET_ID, delete_contents=True, not_found_ok=True)\n",
+ " print(f\"β
Successfully deleted dataset '{DATASET_ID}' and all contents.\")\n",
+ " except Exception as e:\n",
+ " print(f\"β Error deleting dataset: {e}\")\n",
+ "else:\n",
+ " print(f\"\\nπ Operation cancelled. Dataset `{DATASET_ID}` was NOT deleted.\")"
+ ],
+ "metadata": {
+ "cellView": "form",
+ "id": "zlgKYS-CTewm"
+ },
+ "execution_count": null,
+ "outputs": []
+ }
+ ]
+}
\ No newline at end of file
diff --git a/places_insights/notebooks/analyze_site_performance/places_insights_analyze_site_performance_data_generation.ipynb b/places_insights/notebooks/analyze_site_performance/places_insights_analyze_site_performance_data_generation.ipynb
new file mode 100644
index 0000000..aca7e9d
--- /dev/null
+++ b/places_insights/notebooks/analyze_site_performance/places_insights_analyze_site_performance_data_generation.ipynb
@@ -0,0 +1,423 @@
+{
+ "nbformat": 4,
+ "nbformat_minor": 0,
+ "metadata": {
+ "colab": {
+ "provenance": [],
+ "private_outputs": true
+ },
+ "kernelspec": {
+ "name": "python3",
+ "display_name": "Python 3"
+ },
+ "language_info": {
+ "name": "python"
+ }
+ },
+ "cells": [
+ {
+ "cell_type": "code",
+ "source": [
+ "# Copyright 2026 Google LLC\n",
+ "#\n",
+ "# Licensed under the Apache License, Version 2.0 (the \"License\");\n",
+ "# you may not use this file except in compliance with the License.\n",
+ "# You may obtain a copy of the License at\n",
+ "#\n",
+ "# https://www.apache.org/licenses/LICENSE-2.0\n",
+ "#\n",
+ "# Unless required by applicable law or agreed to in writing, software\n",
+ "# distributed under the License is distributed on an \"AS IS\" BASIS,\n",
+ "# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n",
+ "# See the License for the specific language governing permissions and\n",
+ "# limitations under the License."
+ ],
+ "metadata": {
+ "id": "DQy8mJqQzvB6"
+ },
+ "execution_count": null,
+ "outputs": []
+ },
+ {
+ "cell_type": "markdown",
+ "source": [
+ "# Example Data Generation: Store Performance Model\n",
+ "\n",
+ "\n",
+ " \n",
+ " \n",
+ "  Open in Colab\n",
+ " \n",
+ " | \n",
+ " \n",
+ " \n",
+ "  Open in Colab Enterprise\n",
+ " \n",
+ " | \n",
+ " \n",
+ " \n",
+ "  Open in BigQuery Studio\n",
+ " \n",
+ " | \n",
+ " \n",
+ " \n",
+ "  View on GitHub\n",
+ " \n",
+ " | \n",
+ "
"
+ ],
+ "metadata": {
+ "id": "9o7ZSvQX-2Kx"
+ }
+ },
+ {
+ "cell_type": "markdown",
+ "source": [
+ "# Example Data Generation: Store Performance Model\n",
+ "\n",
+ "### **Overview**\n",
+ "This notebook serves as the data generation engine for the **Analyze Site Performance with Google Places Insights and BigQuery ML** notebook.\n",
+ "\n",
+ "Instead of relying on pre-canned data, we demonstrate how to create a realistic training dataset from scratch. We generate randomized store locations in London and perform a geospatial join with the Places Insights dataset. This allows us to calculate synthetic performance scores based on the density of amenities (gyms, restaurants, transit, etc.) surrounding each specific point.\n",
+ "\n",
+ "**Key Features of this Notebook:**\n",
+ "* **Real-time Scoring:** Dynamically calculate store performance based on proximity to real-world places.\n",
+ "* **Visual Verification:** Interactively explore the generated data on a **Google Map** to sanity-check the spatial distribution and performance hotspots.\n",
+ "* **Data Export:** Download the final dataset as a CSV file to be used in the main analysis notebook.\n",
+ "\n",
+ "### **The Methodology**\n",
+ "To simulate realistic business metrics, we model the **Store Performance** ($Y$) as a linear function of the surrounding environment amenities, calculated using a **Multiple Linear Regression** approach.\n",
+ "\n",
+ "The performance score is determined by the count of specific amenities (Predictors) within a **500m radius** of each store, plus a noise term.\n",
+ "\n",
+ "The mathematical model is defined as:\n",
+ "\n",
+ "$$\n",
+ "Y = \\beta_0 + \\beta_1 X_{\\text{gym}} + \\beta_2 X_{\\text{restaurant}} + \\beta_3 X_{\\text{school}} + \\beta_4 X_{\\text{transit}} + \\beta_5 X_{\\text{clothing}} + \\epsilon\n",
+ "$$\n",
+ "\n",
+ "**Where:**\n",
+ "\n",
+ "* $Y$: **Store Performance** (Response Variable), clipped to range $[0, 100]$.\n",
+ "* $\\beta_0$: **Intercept**, set to a base value of **20**.\n",
+ "* $X_i$: **Predictors**, representing the count of operational places within 500m.\n",
+ "* $\\beta_i$: **Coefficients** (weights) assigned to each predictor:\n",
+ " * $\\beta_1 = 0.2$ (Gyms)\n",
+ " * $\\beta_2 = 0.4$ (Restaurants)\n",
+ " * $\\beta_3 = 0.1$ (Schools)\n",
+ " * $\\beta_4 = 0.1$ (Transit Stations)\n",
+ " * $\\beta_5 = 0.2$ (Clothing Stores)\n",
+ "* $\\epsilon$: **Error Term** (Noise), added to introduce variance.\n",
+ "\n",
+ "### **Prerequisites & Setup**\n",
+ "\n",
+ "* **Google Cloud Project:** You need a project with BigQuery enabled.\n",
+ "* **Places Insights Access:** Your project must be subscribed to the [GB Places Insights dataset](https://developers.google.com/maps/documentation/placesinsights/cloud-setup) in BigQuery sharing.\n",
+ "* **Google Maps Platform [API Key](https://developers.google.com/maps/get-started):** Required to render the final interactive map visualization. Enable the [**Maps JavaScript API**](https://developers.google.com/maps/documentation/javascript/get-api-key?setupProd=enable#enable-the-api) and [**Maps Tiles API**](https://developers.google.com/maps/documentation/tile/get-api-key?setupProd=enable#enable-the-api) on this key.\n",
+ "* **Colab Secrets:** Please add the following to the **Secrets** tab (Key icon on the left):\n",
+ " * `GCP_PROJECT_ID`: Your Google Cloud Project ID.\n",
+ " * `GMP_API_KEY`: The Google Maps API Key you configured in the previous step."
+ ],
+ "metadata": {
+ "id": "SO62kdnXf9vq"
+ }
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "id": "HyozE39vag-O"
+ },
+ "outputs": [],
+ "source": [
+ "# @title 1. Setup & Authentication\n",
+ "import random\n",
+ "import requests\n",
+ "import pandas as pd\n",
+ "import geopandas as gpd\n",
+ "import folium\n",
+ "from folium import Element\n",
+ "from google.colab import auth, userdata\n",
+ "from google.cloud import bigquery\n",
+ "from shapely import wkt\n",
+ "\n",
+ "# 1. Retrieve Secrets\n",
+ "GCP_PROJECT_ID = userdata.get('GCP_PROJECT_ID').strip()\n",
+ "print(f\"β
Secrets retrieved for project: {GCP_PROJECT_ID}\")\n",
+ "GMP_API_KEY = userdata.get('GMP_API_KEY').strip()\n",
+ "print(f\"β
GMP API Key retrieved.\")\n",
+ "\n",
+ "# 2. Authenticate User\n",
+ "auth.authenticate_user(project_id=GCP_PROJECT_ID)\n",
+ "print(\"β
User Authenticated.\")\n",
+ "\n",
+ "# 3. Initialize BigQuery Client\n",
+ "client = bigquery.Client(project=GCP_PROJECT_ID)\n",
+ "print(\"β
BigQuery Client Initialized.\")"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "source": [
+ "# @title 2. Generate Synthetic Data & Calculate Scores\n",
+ "# @markdown Note: This cell takes ~2 minutes to execute.\n",
+ "from shapely import wkt\n",
+ "\n",
+ "print(\"Generating synthetic locations in London...\")\n",
+ "\n",
+ "# 1. Generate Random Locations & Noise\n",
+ "LAT_MIN, LAT_MAX = 51.30, 51.70\n",
+ "LNG_MIN, LNG_MAX = -0.50, 0.30\n",
+ "\n",
+ "sql_structs = []\n",
+ "\n",
+ "for i in range(1, 501):\n",
+ " lng = random.uniform(LNG_MIN, LNG_MAX)\n",
+ " lat = random.uniform(LAT_MIN, LAT_MAX)\n",
+ " noise = random.gauss(0, 2)\n",
+ "\n",
+ " # STRUCT construction\n",
+ " sql_structs.append(\n",
+ " f\"STRUCT('STORE_{i:03d}' as store_id, ST_GEOGPOINT({lng:.4f}, {lat:.4f}) as location, {noise:.4f} as noise)\"\n",
+ " )\n",
+ "\n",
+ "generated_data_sql = \",\\n\".join(sql_structs)\n",
+ "\n",
+ "# 2. Construct Query\n",
+ "# We convert location to Text (ST_ASTEXT) to allow Grouping\n",
+ "query = f\"\"\"\n",
+ "WITH t AS (\n",
+ " SELECT * FROM UNNEST([\n",
+ " {generated_data_sql}\n",
+ " ])\n",
+ ")\n",
+ "SELECT WITH AGGREGATION_THRESHOLD\n",
+ " t.store_id,\n",
+ "\n",
+ " ST_ASTEXT(t.location) as location_wkt,\n",
+ "\n",
+ " -- Linear Model\n",
+ " GREATEST(0, LEAST(100,\n",
+ " 20 +\n",
+ " (0.2 * COUNTIF('gym' IN UNNEST(p.types))) +\n",
+ " (0.4 * COUNTIF('restaurant' IN UNNEST(p.types))) +\n",
+ " (0.1 * COUNTIF('school' IN UNNEST(p.types))) +\n",
+ " (0.1 * COUNTIF('transit_station' IN UNNEST(p.types))) +\n",
+ " (0.2 * COUNTIF('clothing_store' IN UNNEST(p.types))) +\n",
+ " AVG(t.noise)\n",
+ " )) AS store_performance\n",
+ "FROM\n",
+ " t\n",
+ "LEFT JOIN\n",
+ " `places_insights___gb.places` AS p\n",
+ " ON ST_DWITHIN(t.location, p.point, 500)\n",
+ " AND p.business_status = 'OPERATIONAL'\n",
+ "GROUP BY\n",
+ " t.store_id, location_wkt\n",
+ "\"\"\"\n",
+ "\n",
+ "print(\"Executing BigQuery GIS join...\")\n",
+ "\n",
+ "# 3. Execute to standard DataFrame (not GeoDataFrame yet)\n",
+ "df = client.query(query).to_dataframe()\n",
+ "\n",
+ "# 4. Convert WKT String back to Geometry object\n",
+ "df['geometry'] = df['location_wkt'].apply(wkt.loads)\n",
+ "\n",
+ "# 5. Convert to GeoDataFrame\n",
+ "df_stores = gpd.GeoDataFrame(df, geometry='geometry')\n",
+ "\n",
+ "# 6. Cleanup: Drop redundant text column and rename geometry to 'location'\n",
+ "# This matches the schema expected by BigQuery in the subsequent notebook.\n",
+ "df_stores = df_stores.drop(columns=['location_wkt'])\n",
+ "df_stores = df_stores.rename_geometry('location')\n",
+ "\n",
+ "print(f\"β
Generated and scored {len(df_stores)} stores.\")\n",
+ "display(df_stores.head())"
+ ],
+ "metadata": {
+ "id": "8nkGpuuQa5Ez"
+ },
+ "execution_count": null,
+ "outputs": []
+ },
+ {
+ "cell_type": "code",
+ "source": [
+ "# @title 3. Maps backend Initialization: Session, Copyright & Assets\n",
+ "# @markdown This cell manages the API handshake. It performs the following steps:\n",
+ "# @markdown 1. **Session Creation:** Authenticates and requests a \"Roadmap\" session for the target region.\n",
+ "# @markdown 2. **Attribution Fetching:** Queries the API for the specific copyright text required for the configured viewport.\n",
+ "# @markdown 3. **Asset Preparation:** Generates the compliant HTML for the Google Maps logo overlay.\n",
+ "\n",
+ "# --- 1. Create Google Maps Session ---\n",
+ "print(\"πΊοΈ Initializing Google Maps Session...\")\n",
+ "session_url = f\"https://tile.googleapis.com/v1/createSession?key={GMP_API_KEY}\"\n",
+ "headers = {\"Content-Type\": \"application/json\"}\n",
+ "payload = {\n",
+ " \"mapType\": \"roadmap\",\n",
+ " \"language\": \"en-GB\",\n",
+ " \"region\": \"GB\"\n",
+ "}\n",
+ "\n",
+ "try:\n",
+ " response = requests.post(session_url, json=payload, headers=headers)\n",
+ " response.raise_for_status()\n",
+ " session_token = response.json().get(\"session\")\n",
+ " print(f\"β
Session Token acquired.\")\n",
+ "except Exception as e:\n",
+ " raise RuntimeError(f\"Failed to initialize Google Maps session: {e}\")\n",
+ "\n",
+ "# --- 2. Fetch Dynamic Attribution for London ---\n",
+ "# Center of our synthetic data area\n",
+ "LAT, LNG = 51.5074, -0.1278\n",
+ "ZOOM_LEVEL = 11\n",
+ "delta = 0.2\n",
+ "\n",
+ "viewport_url = (\n",
+ " f\"https://tile.googleapis.com/tile/v1/viewport?key={GMP_API_KEY}\"\n",
+ " f\"&session={session_token}\"\n",
+ " f\"&zoom={ZOOM_LEVEL}\"\n",
+ " f\"&north={LAT + delta}&south={LAT - delta}\"\n",
+ " f\"&west={LNG - delta}&east={LNG + delta}\"\n",
+ ")\n",
+ "\n",
+ "try:\n",
+ " vp_response = requests.get(viewport_url)\n",
+ " vp_response.raise_for_status()\n",
+ " google_attribution = vp_response.json().get('copyright', 'Map data Β© Google')\n",
+ " print(\"β
Attribution fetched.\")\n",
+ "except Exception as e:\n",
+ " print(f\"β οΈ Warning: Could not fetch attribution ({e}). Defaulting.\")\n",
+ " google_attribution = \"Map data Β© Google\"\n",
+ "\n",
+ "# --- 3. Construct Logo HTML ---\n",
+ "logo_url = \"https://maps.gstatic.com/mapfiles/api-3/images/google_white3.png\"\n",
+ "logo_html = f\"\"\"\n",
+ " \n",
+ "

\n",
+ "
\n",
+ "\"\"\"\n",
+ "print(\"β
Logo HTML prepared.\")"
+ ],
+ "metadata": {
+ "cellView": "form",
+ "id": "-RHoNZHScZmQ"
+ },
+ "execution_count": null,
+ "outputs": []
+ },
+ {
+ "cell_type": "code",
+ "source": [
+ "# @title 4. Display Map with Data Overlay\n",
+ "# @markdown Renders the interactive map with the following components:\n",
+ "# @markdown * **Google Maps Vector Tiles:** As the base layer.\n",
+ "# @markdown * **Store Data Overlay:** Points colored by their calculated `store_performance` score (Purple=Low, Yellow=High).\n",
+ "# @markdown * **Custom Controls:** A performance score legend and the required Google logo.\n",
+ "\n",
+ "# --- 1. Construct Tiles URL ---\n",
+ "tiles_url = f\"https://tile.googleapis.com/v1/2dtiles/{{z}}/{{x}}/{{y}}?session={session_token}&key={GMP_API_KEY}\"\n",
+ "\n",
+ "# --- 2. Initialize Map ---\n",
+ "m = folium.Map(\n",
+ " location=[LAT, LNG],\n",
+ " zoom_start=ZOOM_LEVEL,\n",
+ " tiles=tiles_url,\n",
+ " attr=google_attribution,\n",
+ " name=\"Google Maps\",\n",
+ " control_scale=True,\n",
+ " prefer_canvas=True\n",
+ ")\n",
+ "\n",
+ "# --- 3. Add Google Logo (Bottom Left) ---\n",
+ "m.get_root().html.add_child(Element(logo_html))\n",
+ "\n",
+ "# --- 4. Add Custom Legend/Key (Bottom Right) ---\n",
+ "# We use a CSS linear-gradient that matches the 'viridis' colormap used below\n",
+ "legend_html = \"\"\"\n",
+ "\n",
+ "
Performance Score\n",
+ "
Teal -> Green -> Yellow */\n",
+ " background: linear-gradient(to right, #440154, #3b528b, #21918c, #5ec962, #fde725);\n",
+ " margin-top: 8px;\n",
+ " margin-bottom: 4px;\n",
+ " \">
\n",
+ "
\n",
+ " Low (~20)\n",
+ " High (~80)\n",
+ "
\n",
+ "
\n",
+ "\"\"\"\n",
+ "m.get_root().html.add_child(Element(legend_html))\n",
+ "\n",
+ "# --- 5. Overlay Store Data ---\n",
+ "df_stores.explore(\n",
+ " m=m, # Add to our Google Map instance\n",
+ " column='store_performance',\n",
+ " vmin=20,\n",
+ " vmax=80,\n",
+ " scheme='NaturalBreaks',\n",
+ " marker_kwds={\"radius\": 8, \"fillOpacity\": 0.8},\n",
+ " cmap='viridis',\n",
+ " tooltip=['store_id', 'store_performance'],\n",
+ " name=\"Store Performance\"\n",
+ ")\n",
+ "\n",
+ "# Add layer control to toggle data on/off\n",
+ "folium.LayerControl().add_to(m)\n",
+ "\n",
+ "display(m)"
+ ],
+ "metadata": {
+ "cellView": "form",
+ "id": "gtLKIccGfBZX"
+ },
+ "execution_count": null,
+ "outputs": []
+ },
+ {
+ "cell_type": "code",
+ "source": [
+ "# @title 5. Export Data to CSV\n",
+ "from google.colab import files\n",
+ "\n",
+ "# 1. Save DataFrame to CSV in the Colab virtual machine\n",
+ "filename = 'store_performance_london.csv'\n",
+ "df_stores.to_csv(filename, index=False)\n",
+ "print(f\"β
Saved {filename} to runtime.\")\n",
+ "\n",
+ "# 2. Trigger download to your local machine\n",
+ "files.download(filename)"
+ ],
+ "metadata": {
+ "id": "_AuX1wSncz-v"
+ },
+ "execution_count": null,
+ "outputs": []
+ }
+ ]
+}
\ No newline at end of file