diff --git a/places_insights/notebooks/analyze_site_performance/README.md b/places_insights/notebooks/analyze_site_performance/README.md new file mode 100644 index 0000000..13e82c7 --- /dev/null +++ b/places_insights/notebooks/analyze_site_performance/README.md @@ -0,0 +1,25 @@ +# Analyze Site Performance with Places Insights and BigQuery ML + +This directory contains a complete Geospatial Machine Learning workflow demonstrating how to combine internal operational metrics with external environmental data to diagnose the location factors that drive site success. + +By leveraging **Places Insights**, **BigQuery ML**, and **H3 Spatial Indexing**, this sample shows how to move beyond anecdotal explanations and quantify exactly how local competitive density and neighborhood characteristics dictate performance. + +## Directory Contents + +* **`places_insights_analyze_site_performance_bigquery_ml.ipynb`**: The primary interactive workflow. It demonstrates how to ingest site data, engineer features using Spatial Joins (`ST_DWITHIN`) against the Places Insights dataset, train a Robust Linear Regression model in BigQuery ML, and visualize city-wide expansion opportunities using an interactive H3 grid map. +* **`places_insights_analyze_site_performance_data_generation.ipynb`**: An optional supplementary notebook. It demonstrates how to dynamically generate a realistic, synthetic training dataset of store locations in London by scoring geographic points based on their proximity to real-world amenities. +* **`store_performance_london.csv`**: The static, pre-generated dataset created by the data generation notebook. This allows users to run the main BigQuery ML workflow immediately without needing to generate their own data. + +## Getting Started + +### Prerequisites + +To execute these notebooks, you will need: +1. **Google Cloud Project**: With billing enabled and BigQuery active. +2. **Places Insights Access**: Your project must be subscribed to the [GB Places Insights dataset](https://developers.google.com/maps/documentation/placesinsights/cloud-setup) in BigQuery. +3. **Google Maps Platform API Key**: Required to render the interactive map visualizations. You must enable the **Maps JavaScript API** and **Maps Tiles API** on this key. + +### Execution Order + +1. *(Optional)* Run `places_insights_analyze_site_performance_data_generation.ipynb` to see how the synthetic correlation between performance and amenities is mathematically generated. +2. Run `places_insights_analyze_site_performance_bigquery_ml.ipynb`. The notebook automatically fetches the provided `store_performance_london.csv` dataset directly from GitHub to proceed with the BigQuery ML training and prospecting visualization. *(Note: If you ran the optional data generation step, you can modify the notebook to ingest your custom generated file instead).* \ No newline at end of file diff --git a/places_insights/notebooks/analyze_site_performance/places_insights_analyze_site_performance_bigquery_ml.ipynb b/places_insights/notebooks/analyze_site_performance/places_insights_analyze_site_performance_bigquery_ml.ipynb new file mode 100644 index 0000000..c99bea4 --- /dev/null +++ b/places_insights/notebooks/analyze_site_performance/places_insights_analyze_site_performance_bigquery_ml.ipynb @@ -0,0 +1,731 @@ +{ + "nbformat": 4, + "nbformat_minor": 0, + "metadata": { + "colab": { + "provenance": [], + "private_outputs": true + }, + "kernelspec": { + "name": "python3", + "display_name": "Python 3" + }, + "language_info": { + "name": "python" + } + }, + "cells": [ + { + "cell_type": "code", + "source": [ + "# Copyright 2026 Google LLC\n", + "#\n", + "# Licensed under the Apache License, Version 2.0 (the \"License\");\n", + "# you may not use this file except in compliance with the License.\n", + "# You may obtain a copy of the License at\n", + "#\n", + "# https://www.apache.org/licenses/LICENSE-2.0\n", + "#\n", + "# Unless required by applicable law or agreed to in writing, software\n", + "# distributed under the License is distributed on an \"AS IS\" BASIS,\n", + "# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n", + "# See the License for the specific language governing permissions and\n", + "# limitations under the License." + ], + "metadata": { + "id": "pDKqIF_IzOI6" + }, + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "source": [ + "# πŸ“Š Analyze Site Performance with Places Insights and BigQuery ML\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + "
\n", + " \n", + " \"Google
Open in Colab\n", + "
\n", + "
\n", + " \n", + " \"Google
Open in Colab Enterprise\n", + "
\n", + "
\n", + " \n", + " \"BigQuery
Open in BigQuery Studio\n", + "
\n", + "
\n", + " \n", + " \"GitHub
View on GitHub\n", + "
\n", + "
" + ], + "metadata": { + "id": "qcyzyPNI-YQ3" + } + }, + { + "cell_type": "markdown", + "source": [ + "### **Overview**\n", + "\n", + "This notebook demonstrates a Geospatial Machine Learning workflow. We will combine internal operational metrics (synthetic store performance) with external environmental data (**Places Insights**) to diagnose the location factors that drive success.\n", + "\n", + "By the end of this session, you will have a trained **Robust Linear Regression** model and an interactive **Prospecting Map** that scores every neighborhood in London based on its amenity profile.\n", + "\n", + "### **Key Technologies**\n", + "\n", + "* **[Google Places Insights](https://developers.google.com/maps/documentation/placesinsights):** A BigQuery-native dataset providing aggregated counts of Places (POIs) without needing to query an API.\n", + "* **[BigQuery ML](https://cloud.google.com/bigquery/docs/bqml-introduction):** Allows us to create, train, and deploy the machine learning model directly using standard SQL.\n", + "* **[H3 Spatial Indexing](https://h3geo.org/):** We use H3 to divide the city into uniform cells for consistent scoring and visualization.\n", + "* **[IPython Magics](https://googleapis.dev/python/bigquery-magics/latest/):** We use `%%bigquery` to write SQL directly in Colab cells.\n", + "\n", + "### **The Workflow**\n", + "\n", + "1. **Data Ingestion:** We upload a synthetic dataset of 400 - 500 stores across **London** with varying performance scores.\n", + "2. **Feature Engineering:** We use **Spatial Joins** (`ST_DWITHIN`) to count amenities (Gyms, Schools, Transit, etc.) within a 500m radius of every store.\n", + "3. **Model Training:** We train a **Robust Linear Regression** model (`ML.ROBUST_SCALER`) to predict performance while handling geospatial outliers.\n", + "4. **Evaluation:** We assess model accuracy using RΒ² and Mean Absolute Error (MAE) on a holdout test set.\n", + "5. **City-Wide Prospecting:** Instead of scoring a single site, we apply the model to the **entire London H3 Grid** (Resolution 8) to visualize performance hotspots across the city.\n", + "6. **Clean Up:** We provide a final step to delete the dataset and all created tables/models from your Google Cloud project.\n", + "\n", + "### **Prerequisites & Setup**\n", + "\n", + "* **Google Cloud Project:** You need a project with BigQuery enabled.\n", + "* **Places Insights Access:** Your project must be subscribed to the [GB Places Insights dataset](https://developers.google.com/maps/documentation/placesinsights/cloud-setup) in BigQuery sharing.\n", + "* **Google Maps Platform [API Key](https://developers.google.com/maps/get-started):** Required to render the final interactive map visualization. Enable the [**Maps JavaScript API**](https://developers.google.com/maps/documentation/javascript/get-api-key?setupProd=enable#enable-the-api) and [**Maps Tiles API**](https://developers.google.com/maps/documentation/tile/get-api-key?setupProd=enable#enable-the-api) on this key.\n", + "* **Colab Secrets:** Please add the following to the **Secrets** tab (Key icon on the left):\n", + " * `GCP_PROJECT_ID`: Your Google Cloud Project ID.\n", + " * `GMP_API_KEY`: The Google Maps API Key you configured in the previous step." + ], + "metadata": { + "id": "xqu18lHqJxM4" + } + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "iBQ867lfDZRf", + "cellView": "form" + }, + "outputs": [], + "source": [ + "# @title 1a. Setup & Authentication\n", + "# @markdown Authenticate to Google Cloud, retrieve secrets, and initialize the BigQuery client.\n", + "# @markdown\n", + "# @markdown This cell creates a BigQuery dataset to use during this notebook, use this input to select the region (Default: US).\n", + "\n", + "import sys\n", + "import random\n", + "import pandas as pd\n", + "import seaborn as sns\n", + "import pandas_gbq\n", + "from google.colab import auth, userdata\n", + "from google.cloud import bigquery\n", + "import requests\n", + "import geopandas as gpd\n", + "import folium\n", + "\n", + "# 1. Retrieve Secrets\n", + "GCP_PROJECT_ID = userdata.get('GCP_PROJECT_ID').strip()\n", + "print(f\"βœ… Secrets retrieved for project: {GCP_PROJECT_ID}\")\n", + "GMP_API_KEY = userdata.get('GMP_API_KEY').strip()\n", + "print(f\"βœ… GMP API Key retrieved.\")\n", + "\n", + "# 2. Authenticate User\n", + "auth.authenticate_user(project_id=GCP_PROJECT_ID)\n", + "print(\"βœ… User Authenticated.\")\n", + "\n", + "# 3. Global Configuration\n", + "DATASET_ID = \"places_insights_site_perf_demo\"\n", + "REGION = \"US\" # @param {type:\"string\"}\n", + "STORES_TABLE = f\"{DATASET_ID}.store_performance\"\n", + "FEATURES_TABLE = f\"{DATASET_ID}.store_features\"\n", + "MODEL_NAME = f\"{DATASET_ID}.site_performance_model\"\n", + "\n", + "# 4. Initialize BigQuery Dataset\n", + "client = bigquery.Client(project=GCP_PROJECT_ID)\n", + "ds = bigquery.Dataset(f\"{GCP_PROJECT_ID}.{DATASET_ID}\")\n", + "ds.location = REGION\n", + "client.create_dataset(ds, exists_ok=True)\n", + "print(f\"βœ… Working dataset ready: {GCP_PROJECT_ID}.{DATASET_ID}\")" + ] + }, + { + "cell_type": "code", + "source": [ + "# @title 1b. Maps backend Initialization: Session, Copyright & Assets\n", + "# @markdown This cell manages the Maps API handshake. It performs the following steps:\n", + "# @markdown 1. **Session Creation:** Authenticates and requests a \"Roadmap\" session for the target region.\n", + "# @markdown 2. **Attribution Fetching:** Queries the API for the copyright text required for the configured viewport.\n", + "# @markdown 3. **Asset Preparation:** Generates the HTML for the Google Maps logo overlay.\n", + "\n", + "# --- 1. Create Google Maps Session ---\n", + "print(\"πŸ—ΊοΈ Initializing Google Maps Session...\")\n", + "session_url = f\"https://tile.googleapis.com/v1/createSession?key={GMP_API_KEY}\"\n", + "headers = {\"Content-Type\": \"application/json\"}\n", + "payload = {\n", + " \"mapType\": \"roadmap\",\n", + " \"language\": \"en-GB\",\n", + " \"region\": \"GB\"\n", + "}\n", + "\n", + "try:\n", + " response = requests.post(session_url, json=payload, headers=headers)\n", + " response.raise_for_status()\n", + " session_token = response.json().get(\"session\")\n", + " print(f\"βœ… Session Token acquired.\")\n", + "except Exception as e:\n", + " raise RuntimeError(f\"Failed to initialize Google Maps session: {e}\")\n", + "\n", + "# --- 2. Fetch Dynamic Attribution for London ---\n", + "# Center of our synthetic data area\n", + "LAT, LNG = 51.5074, -0.1278\n", + "ZOOM_LEVEL = 11\n", + "delta = 0.2\n", + "\n", + "viewport_url = (\n", + " f\"https://tile.googleapis.com/tile/v1/viewport?key={GMP_API_KEY}\"\n", + " f\"&session={session_token}\"\n", + " f\"&zoom={ZOOM_LEVEL}\"\n", + " f\"&north={LAT + delta}&south={LAT - delta}\"\n", + " f\"&west={LNG - delta}&east={LNG + delta}\"\n", + ")\n", + "\n", + "try:\n", + " vp_response = requests.get(viewport_url)\n", + " vp_response.raise_for_status()\n", + " google_attribution = vp_response.json().get('copyright', 'Map data Β© Google')\n", + " print(\"βœ… Attribution fetched.\")\n", + "except Exception as e:\n", + " print(f\"⚠️ Warning: Could not fetch attribution ({e}). Defaulting.\")\n", + " google_attribution = \"Map data Β© Google\"\n", + "\n", + "# --- 3. Construct Logo HTML ---\n", + "logo_url = \"https://maps.gstatic.com/mapfiles/api-3/images/google_white3.png\"\n", + "logo_html = f\"\"\"\n", + "
\n", + " \"Google\n", + "
\n", + "\"\"\"\n", + "print(\"βœ… Logo HTML prepared.\")" + ], + "metadata": { + "cellView": "form", + "id": "mOPQnlbFHk3C" + }, + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "source": [ + "### 2. Import Data\n", + "\n", + "In this step, we import the pre-generated dataset representing **store locations in London**.\n", + "\n", + "This dataset contains:\n", + "* `store_id`: Unique identifier.\n", + "* `store_performance`: The synthetic performance score (0-100).\n", + "* `geometry`: The geospatial location (Point).\n", + "\n", + "We will upload the CSV locally and persist it to BigQuery to serve as the foundation for our model training." + ], + "metadata": { + "id": "qp-17vZ_F6-3" + } + }, + { + "cell_type": "code", + "source": [ + "# @title 2a. Fetch Synthetic Data from GitHub\n", + "# @markdown We load the pre-generated `store_performance_london.csv` file directly from the public GitHub repository.\n", + "# @markdown\n", + "# @markdown *Curious how this synthetic dataset was created? Check out the [Data Generation Notebook](https://github.com/googlemaps-samples/insights-samples/blob/main/places_insights/notebooks/analyze_site_performance/places_insights_analyze_site_performance_data_generation.ipynb).*\n", + "import pandas as pd\n", + "\n", + "github_url = \"https://raw.githubusercontent.com/googlemaps-samples/insights-samples/main/places_insights/notebooks/analyze_site_performance/store_performance_london.csv\"\n", + "\n", + "print(\"⬇️ Fetching data from GitHub...\")\n", + "\n", + "# Read the CSV directly from the URL into a DataFrame\n", + "df_input = pd.read_csv(github_url)\n", + "\n", + "print(f\"βœ… Successfully loaded {len(df_input)} rows.\")\n", + "display(df_input.head())" + ], + "metadata": { + "id": "Labi1lGgETuG" + }, + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "source": [ + "# @title 2b. Load Data to BigQuery\n", + "# @markdown We upload the DataFrame to the `STORES_TABLE` in BigQuery, casting the geometry column correctly.\n", + "\n", + "# 1. Define Schema to ensure 'location' is parsed as GEOGRAPHY (not String)\n", + "table_schema = [\n", + " {'name': 'store_id', 'type': 'STRING'},\n", + " {'name': 'store_performance', 'type': 'FLOAT'},\n", + " {'name': 'location', 'type': 'GEOGRAPHY'}, # Critical: Casts WKT string to GEOGRAPHY\n", + "]\n", + "\n", + "# 2. Upload to BigQuery\n", + "print(f\"☁️ Uploading data to `{STORES_TABLE}`...\")\n", + "\n", + "pandas_gbq.to_gbq(\n", + " dataframe=df_input,\n", + " destination_table=STORES_TABLE,\n", + " project_id=GCP_PROJECT_ID,\n", + " if_exists='replace',\n", + " table_schema=table_schema,\n", + " location=REGION\n", + ")\n", + "\n", + "print(f\"βœ… Successfully loaded data to `{STORES_TABLE}`.\")" + ], + "metadata": { + "id": "Ya668gzclbPB" + }, + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "source": [ + "# @title 3. Feature Engineering (Spatial Join)\n", + "# @markdown We now bridge the gap between internal performance data and the external world using a **Spatial Join**.\n", + "# @markdown\n", + "# @markdown **The Logic:**\n", + "# @markdown 1. **`ST_DWITHIN`:** For every store in our database, we look for Places within a **500-meter radius**.\n", + "# @markdown 2. **`COUNTIF`:** We calculate density vectors (e.g., \"How many gyms are nearby?\") to serve as our model features ($X$).\n", + "# @markdown 3. **Output:** The result is downloaded to the Python variable `df_features`.\n", + "\n", + "%%bigquery df_features --project $GCP_PROJECT_ID --location $REGION\n", + "\n", + "SELECT WITH AGGREGATION_THRESHOLD\n", + " internal.store_id,\n", + " internal.store_performance,\n", + "\n", + " -- Feature Engineering: count nearby POIs by type\n", + " COUNTIF('gym' IN UNNEST(places.types)) AS gym_count,\n", + " COUNTIF('restaurant' IN UNNEST(places.types)) AS restaurant_count,\n", + " COUNTIF('school' IN UNNEST(places.types)) AS school_count,\n", + " COUNTIF('transit_station' IN UNNEST(places.types)) AS transit_count,\n", + " COUNTIF('clothing_store' IN UNNEST(places.types)) AS clothing_store_count\n", + "\n", + "FROM\n", + " `places_insights_site_perf_demo.store_performance` AS internal\n", + "JOIN\n", + " `places_insights___gb.places` AS places\n", + " ON ST_DWITHIN(internal.location, places.point, 500) -- 500m Radius\n", + "WHERE\n", + " places.business_status = 'OPERATIONAL'\n", + "GROUP BY\n", + " internal.store_id, internal.store_performance" + ], + "metadata": { + "id": "J73jjZp8G7zh" + }, + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "source": [ + "# @markdown Save the engineered features back to a permanent BQ table for training.\n", + "\n", + "pandas_gbq.to_gbq( # type: ignore\n", + " dataframe=df_features, # type: ignore\n", + " destination_table=FEATURES_TABLE,\n", + " project_id=GCP_PROJECT_ID,\n", + " if_exists='replace',\n", + " location=REGION\n", + ")\n", + "\n", + "print(f\"βœ… Training data saved to `{FEATURES_TABLE}`\")\n", + "display(df_features.head()) # type: ignore" + ], + "metadata": { + "id": "SJqBUR3PHAhh" + }, + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "source": [ + "# @title **Exploratory Data Analysis: Feature Correlations**\n", + "# @markdown We use a **Pairplot** to visualize how each feature interacts with the target variable (`store_performance`).\n", + "# @markdown\n", + "# @markdown **Key Observations:**\n", + "# @markdown * **Linearity:** Look at the top row. You can see a clear positive linear trend between features like `restaurant_count` and `store_performance`. This confirms that a **Linear Regression** model is the right choice for this data.\n", + "# @markdown * **Distributions:** The diagonal histograms show that our amenity counts are \"right-skewed\" (mostly low numbers with a few high-density hubs), which is typical for geospatial data.\n", + "\n", + "import matplotlib.pyplot as plt\n", + "\n", + "input_features = ['store_performance', 'gym_count', 'restaurant_count', 'school_count', 'transit_count', 'clothing_store_count']\n", + "g = sns.pairplot(df_features[input_features], plot_kws={\"s\": 3, 'alpha': 0.6}, diag_kws={'color': 'crimson'}, height=1.8) # type: ignore\n", + "g.set(xlim=(0, 100), ylim=(0, 100))\n", + "\n", + "plt.show()" + ], + "metadata": { + "id": "W9bWzCWcQ5Nm" + }, + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "source": [ + "# @title 4. Train the Linear Regression Model\n", + "# @markdown We now train a **Linear Regression** model to predict store performance.\n", + "# @markdown\n", + "# @markdown **Key Model Configuration:**\n", + "# @markdown * **`ML.ROBUST_SCALER`:** We use this within the `TRANSFORM` clause. Unlike standard scaling (Mean/StdDev), robust scaling uses the **Median** and **IQR**. This is critical for geospatial data, where a single location with 500 restaurants (an outlier) could otherwise skew the entire model.\n", + "# @markdown * **`AUTO_SPLIT`:** We let BigQuery automatically reserve ~20% of the data for evaluation. This makes sure we test the model on data it has never seen before.\n", + "# @markdown * **`NORMAL_EQUATION`:** Since our dataset is small, we use the exact mathematical solution rather than an iterative approximation.\n", + "# @markdown * **Outlier Removal:** We filter out stores with `performance > 75` to focus the model on predicting the mechanics of \"typical\" or \"developing\" sites, rather than established outliers.\n", + "\n", + "%%bigquery --project $GCP_PROJECT_ID --location $REGION\n", + "\n", + "CREATE OR REPLACE MODEL `places_insights_site_perf_demo.site_performance_model`\n", + "TRANSFORM(\n", + " store_performance,\n", + " -- Feature Engineering inside the model artifact\n", + " -- These stats are calculated on the TRAINING split only\n", + " ML.ROBUST_SCALER(gym_count) OVER() AS scaled_gym_count,\n", + " ML.ROBUST_SCALER(restaurant_count) OVER() AS scaled_restaurant_count,\n", + " ML.ROBUST_SCALER(school_count) OVER() AS scaled_school_count,\n", + " ML.ROBUST_SCALER(transit_count) OVER() AS scaled_transit_count,\n", + " ML.ROBUST_SCALER(clothing_store_count) OVER() AS scaled_clothing_store_count\n", + ")\n", + "OPTIONS(\n", + " model_type = 'LINEAR_REG',\n", + " input_label_cols = ['store_performance'],\n", + "\n", + " -- OPTIMIZATION PARAMETERS\n", + " optimize_strategy = 'NORMAL_EQUATION', -- Exact mathematical solution (fast for small data)\n", + " data_split_method = 'AUTO_SPLIT', -- Automatically reserves ~20% for evaluation\n", + "\n", + " -- DIAGNOSTICS\n", + " enable_global_explain = TRUE -- Essential to see feature importance\n", + ")\n", + "AS\n", + "SELECT\n", + " gym_count,\n", + " restaurant_count,\n", + " school_count,\n", + " transit_count,\n", + " clothing_store_count,\n", + " store_performance\n", + "FROM\n", + " `places_insights_site_perf_demo.store_features`\n", + "WHERE\n", + " store_performance < 75" + ], + "metadata": { + "id": "F1fk_m2XHMgz" + }, + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "source": [ + "# @title 5. Evaluate model performance\n", + "# @markdown We use `ML.EVALUATE` to test the model against the unseen \"Holdout\" data (the ~20% reserved automatically during training).\n", + "# @markdown The results (MAE, R2, etc.) are downloaded to the `df_eval` DataFrame for inspection in the next step.\n", + "\n", + "%%bigquery df_eval --project $GCP_PROJECT_ID --location $REGION\n", + "\n", + "SELECT *\n", + "FROM ML.EVALUATE(MODEL `places_insights_site_perf_demo.site_performance_model`)" + ], + "metadata": { + "id": "v_M7MRmsHQuD" + }, + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "source": [ + "# @markdown ### **Interpretation of Results**\n", + "# @markdown * **R2 Score:** Measures how well the amenities explain the performance. A score close to **1.0** indicates a perfect fit. Since our data is synthetic and linear, we expect a very high score here (> 0.9).\n", + "# @markdown * **Mean Absolute Error (MAE):** The average \"miss\" in points. For example, an MAE of **1.5** means the model's prediction is typically within +/- 1.5 points of the actual score.\n", + "\n", + "print(f\"R2 Score: {df_eval['r2_score'][0]:.4f}\") # type: ignore\n", + "print(f\"Mean Absolute Error: {df_eval['mean_absolute_error'][0]:.4f}\") # type: ignore\n", + "display(df_eval) # type: ignore" + ], + "metadata": { + "id": "GOWw427HHTdy" + }, + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "source": [ + "# @title 6. Score London by H3 Cell (Native Places Insights)\n", + "# @markdown We now apply our trained model to London using the native Places Insights H3 function.\n", + "# @markdown\n", + "# @markdown **The Approach:**\n", + "# @markdown 1. **H3 Indexing & Counting:** We use `PLACES_COUNT_PER_H3` to get pre-aggregated counts of amenities within a 25km radius of central London.\n", + "# @markdown 2. **Pivoting:** Because the function returns one row per amenity type, we use `UNION ALL` and group the results to create the feature columns (`gym_count`, `restaurant_count`, etc.).\n", + "# @markdown 3. **Batch Prediction:** We feed these \"Grid Features\" into `ML.PREDICT` to generate a `predicted_store_performance` score for every cell.\n", + "\n", + "%%bigquery df_h3_predictions --project $GCP_PROJECT_ID --location $REGION\n", + "\n", + "WITH combined_counts AS (\n", + " -- Gyms\n", + " SELECT h3_cell_index, geography, count, 'gym' AS type\n", + " FROM `places_insights___gb.PLACES_COUNT_PER_H3`(\n", + " JSON_OBJECT(\n", + " 'geography', ST_BUFFER(ST_GEOGPOINT(-0.1278, 51.5074), 25000), -- 25km radius around London\n", + " 'h3_resolution', 8,\n", + " 'business_status', ['OPERATIONAL'],\n", + " 'types', ['gym']\n", + " )\n", + " )\n", + " UNION ALL\n", + " -- Restaurants\n", + " SELECT h3_cell_index, geography, count, 'restaurant' AS type\n", + " FROM `places_insights___gb.PLACES_COUNT_PER_H3`(\n", + " JSON_OBJECT(\n", + " 'geography', ST_BUFFER(ST_GEOGPOINT(-0.1278, 51.5074), 25000),\n", + " 'h3_resolution', 8,\n", + " 'business_status', ['OPERATIONAL'],\n", + " 'types', ['restaurant']\n", + " )\n", + " )\n", + " UNION ALL\n", + " -- Schools\n", + " SELECT h3_cell_index, geography, count, 'school' AS type\n", + " FROM `places_insights___gb.PLACES_COUNT_PER_H3`(\n", + " JSON_OBJECT(\n", + " 'geography', ST_BUFFER(ST_GEOGPOINT(-0.1278, 51.5074), 25000),\n", + " 'h3_resolution', 8,\n", + " 'business_status', ['OPERATIONAL'],\n", + " 'types', ['school']\n", + " )\n", + " )\n", + " UNION ALL\n", + " -- Transit Stations\n", + " SELECT h3_cell_index, geography, count, 'transit_station' AS type\n", + " FROM `places_insights___gb.PLACES_COUNT_PER_H3`(\n", + " JSON_OBJECT(\n", + " 'geography', ST_BUFFER(ST_GEOGPOINT(-0.1278, 51.5074), 25000),\n", + " 'h3_resolution', 8,\n", + " 'business_status', ['OPERATIONAL'],\n", + " 'types', ['transit_station']\n", + " )\n", + " )\n", + " UNION ALL\n", + " -- Clothing Stores\n", + " SELECT h3_cell_index, geography, count, 'clothing_store' AS type\n", + " FROM `places_insights___gb.PLACES_COUNT_PER_H3`(\n", + " JSON_OBJECT(\n", + " 'geography', ST_BUFFER(ST_GEOGPOINT(-0.1278, 51.5074), 25000),\n", + " 'h3_resolution', 8,\n", + " 'business_status', ['OPERATIONAL'],\n", + " 'types', ['clothing_store']\n", + " )\n", + " )\n", + "),\n", + "aggregated_features AS (\n", + " -- Pivot the stacked rows back into standard feature columns for the ML Model\n", + " SELECT\n", + " h3_cell_index AS h3_index,\n", + " ANY_VALUE(geography) AS h3_geography,\n", + " SUM(IF(type = 'gym', count, 0)) AS gym_count,\n", + " SUM(IF(type = 'restaurant', count, 0)) AS restaurant_count,\n", + " SUM(IF(type = 'school', count, 0)) AS school_count,\n", + " SUM(IF(type = 'transit_station', count, 0)) AS transit_count,\n", + " SUM(IF(type = 'clothing_store', count, 0)) AS clothing_store_count\n", + " FROM\n", + " combined_counts\n", + " GROUP BY\n", + " h3_cell_index\n", + ")\n", + "\n", + "-- Feed the pivoted features into the model\n", + "SELECT\n", + " h3_index,\n", + " predicted_store_performance,\n", + " h3_geography,\n", + " gym_count,\n", + " restaurant_count\n", + "FROM\n", + " ML.PREDICT(MODEL `places_insights_site_perf_demo.site_performance_model`,\n", + " (SELECT * FROM aggregated_features)\n", + " )\n", + "ORDER BY\n", + " predicted_store_performance DESC;" + ], + "metadata": { + "id": "UssuC8508R0L" + }, + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "source": [ + "# @title 7. Display H3 Prospecting Map\n", + "# @markdown We render the H3 grid as a choropleth layer.\n", + "# @markdown * **Yellow Areas:** High predicted performance (Hotspots).\n", + "# @markdown * **Purple Areas:** Low predicted performance (Coldspots).\n", + "# @markdown * **Interactive:** Hover over any cell to see the underlying amenity counts driving the score.\n", + "\n", + "import geopandas as gpd\n", + "from folium import Element\n", + "from shapely import wkt\n", + "\n", + "# --- 1. Prepare Data ---\n", + "# Explicitly convert the WKT strings from BigQuery into Shapely Geometry objects\n", + "if isinstance(df_h3_predictions['h3_geography'].iloc[0], str):\n", + " df_h3_predictions['h3_geography'] = df_h3_predictions['h3_geography'].apply(wkt.loads)\n", + "\n", + "# Create GeoDataFrame\n", + "gdf_h3 = gpd.GeoDataFrame(df_h3_predictions, geometry='h3_geography')\n", + "\n", + "# --- 2. Construct Tiles URL ---\n", + "tiles_url = f\"https://tile.googleapis.com/v1/2dtiles/{{z}}/{{x}}/{{y}}?session={session_token}&key={GMP_API_KEY}\"\n", + "\n", + "# --- 3. Initialize Map ---\n", + "m = folium.Map(\n", + " location=[51.5074, -0.1278],\n", + " zoom_start=11,\n", + " tiles=tiles_url,\n", + " attr=google_attribution,\n", + " name=\"Google Maps\",\n", + " control_scale=True,\n", + " prefer_canvas=True\n", + ")\n", + "\n", + "# --- 4. Add Google Logo (Bottom Left) ---\n", + "m.get_root().html.add_child(Element(logo_html))\n", + "\n", + "# --- 5. Add Custom Legend (Bottom Right) ---\n", + "legend_html = \"\"\"\n", + "
\n", + " Predicted Score\n", + "
Teal -> Green -> Yellow */\n", + " background: linear-gradient(to right, #440154, #3b528b, #21918c, #5ec962, #fde725);\n", + " margin-top: 8px;\n", + " margin-bottom: 4px;\n", + " \">
\n", + "
\n", + " Low (~20)\n", + " High (~80)\n", + "
\n", + "
\n", + "\"\"\"\n", + "m.get_root().html.add_child(Element(legend_html))\n", + "\n", + "# --- 6. Overlay H3 Grid ---\n", + "gdf_h3.explore(\n", + " m=m,\n", + " column='predicted_store_performance',\n", + " cmap='viridis',\n", + " vmin=20,\n", + " vmax=80,\n", + " # Style Keywords for Polygons (remove borders for smoother look)\n", + " style_kwds={'stroke': False, 'fillOpacity': 0.6},\n", + " tooltip=[\n", + " 'h3_index',\n", + " 'predicted_store_performance',\n", + " 'gym_count',\n", + " 'restaurant_count'\n", + " # Note: 'transit_count' removed because it wasn't selected in the SQL query\n", + " ],\n", + " name=\"Prospecting Heatmap\"\n", + ")\n", + "\n", + "# Add layer control to toggle data on/off\n", + "folium.LayerControl().add_to(m)\n", + "\n", + "display(m)" + ], + "metadata": { + "cellView": "form", + "id": "FwVtreNaj_q8" + }, + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "source": [ + "# @title 8. Clean Up Resources\n", + "# @markdown This cell deletes the demo dataset (`places_insights_site_perf_demo`) and all tables within it.\n", + "# @markdown\n", + "# @markdown **You will be prompted to confirm before deletion.**\n", + "\n", + "from google.cloud.exceptions import NotFound\n", + "\n", + "# Validation\n", + "print(f\"⚠️ WARNING: You are about to DELETE the dataset: `{DATASET_ID}`\")\n", + "print(f\" Project: `{GCP_PROJECT_ID}`\")\n", + "print(\" This action cannot be undone.\")\n", + "\n", + "# Interactive Input\n", + "confirmation = input(\"Type 'yes' to proceed with deletion: \").strip().lower()\n", + "\n", + "if confirmation == 'yes':\n", + " print(f\"\\nπŸ—‘οΈ Deleting dataset: {DATASET_ID}...\")\n", + " try:\n", + " # delete_contents=True removes tables inside the dataset\n", + " # not_found_ok=True prevents errors if the dataset is already gone\n", + " client.delete_dataset(DATASET_ID, delete_contents=True, not_found_ok=True)\n", + " print(f\"βœ… Successfully deleted dataset '{DATASET_ID}' and all contents.\")\n", + " except Exception as e:\n", + " print(f\"❌ Error deleting dataset: {e}\")\n", + "else:\n", + " print(f\"\\nπŸ›‘ Operation cancelled. Dataset `{DATASET_ID}` was NOT deleted.\")" + ], + "metadata": { + "cellView": "form", + "id": "zlgKYS-CTewm" + }, + "execution_count": null, + "outputs": [] + } + ] +} \ No newline at end of file diff --git a/places_insights/notebooks/analyze_site_performance/places_insights_analyze_site_performance_data_generation.ipynb b/places_insights/notebooks/analyze_site_performance/places_insights_analyze_site_performance_data_generation.ipynb new file mode 100644 index 0000000..aca7e9d --- /dev/null +++ b/places_insights/notebooks/analyze_site_performance/places_insights_analyze_site_performance_data_generation.ipynb @@ -0,0 +1,423 @@ +{ + "nbformat": 4, + "nbformat_minor": 0, + "metadata": { + "colab": { + "provenance": [], + "private_outputs": true + }, + "kernelspec": { + "name": "python3", + "display_name": "Python 3" + }, + "language_info": { + "name": "python" + } + }, + "cells": [ + { + "cell_type": "code", + "source": [ + "# Copyright 2026 Google LLC\n", + "#\n", + "# Licensed under the Apache License, Version 2.0 (the \"License\");\n", + "# you may not use this file except in compliance with the License.\n", + "# You may obtain a copy of the License at\n", + "#\n", + "# https://www.apache.org/licenses/LICENSE-2.0\n", + "#\n", + "# Unless required by applicable law or agreed to in writing, software\n", + "# distributed under the License is distributed on an \"AS IS\" BASIS,\n", + "# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n", + "# See the License for the specific language governing permissions and\n", + "# limitations under the License." + ], + "metadata": { + "id": "DQy8mJqQzvB6" + }, + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "source": [ + "# Example Data Generation: Store Performance Model\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + "
\n", + " \n", + " \"Google
Open in Colab\n", + "
\n", + "
\n", + " \n", + " \"Google
Open in Colab Enterprise\n", + "
\n", + "
\n", + " \n", + " \"BigQuery
Open in BigQuery Studio\n", + "
\n", + "
\n", + " \n", + " \"GitHub
View on GitHub\n", + "
\n", + "
" + ], + "metadata": { + "id": "9o7ZSvQX-2Kx" + } + }, + { + "cell_type": "markdown", + "source": [ + "# Example Data Generation: Store Performance Model\n", + "\n", + "### **Overview**\n", + "This notebook serves as the data generation engine for the **Analyze Site Performance with Google Places Insights and BigQuery ML** notebook.\n", + "\n", + "Instead of relying on pre-canned data, we demonstrate how to create a realistic training dataset from scratch. We generate randomized store locations in London and perform a geospatial join with the Places Insights dataset. This allows us to calculate synthetic performance scores based on the density of amenities (gyms, restaurants, transit, etc.) surrounding each specific point.\n", + "\n", + "**Key Features of this Notebook:**\n", + "* **Real-time Scoring:** Dynamically calculate store performance based on proximity to real-world places.\n", + "* **Visual Verification:** Interactively explore the generated data on a **Google Map** to sanity-check the spatial distribution and performance hotspots.\n", + "* **Data Export:** Download the final dataset as a CSV file to be used in the main analysis notebook.\n", + "\n", + "### **The Methodology**\n", + "To simulate realistic business metrics, we model the **Store Performance** ($Y$) as a linear function of the surrounding environment amenities, calculated using a **Multiple Linear Regression** approach.\n", + "\n", + "The performance score is determined by the count of specific amenities (Predictors) within a **500m radius** of each store, plus a noise term.\n", + "\n", + "The mathematical model is defined as:\n", + "\n", + "$$\n", + "Y = \\beta_0 + \\beta_1 X_{\\text{gym}} + \\beta_2 X_{\\text{restaurant}} + \\beta_3 X_{\\text{school}} + \\beta_4 X_{\\text{transit}} + \\beta_5 X_{\\text{clothing}} + \\epsilon\n", + "$$\n", + "\n", + "**Where:**\n", + "\n", + "* $Y$: **Store Performance** (Response Variable), clipped to range $[0, 100]$.\n", + "* $\\beta_0$: **Intercept**, set to a base value of **20**.\n", + "* $X_i$: **Predictors**, representing the count of operational places within 500m.\n", + "* $\\beta_i$: **Coefficients** (weights) assigned to each predictor:\n", + " * $\\beta_1 = 0.2$ (Gyms)\n", + " * $\\beta_2 = 0.4$ (Restaurants)\n", + " * $\\beta_3 = 0.1$ (Schools)\n", + " * $\\beta_4 = 0.1$ (Transit Stations)\n", + " * $\\beta_5 = 0.2$ (Clothing Stores)\n", + "* $\\epsilon$: **Error Term** (Noise), added to introduce variance.\n", + "\n", + "### **Prerequisites & Setup**\n", + "\n", + "* **Google Cloud Project:** You need a project with BigQuery enabled.\n", + "* **Places Insights Access:** Your project must be subscribed to the [GB Places Insights dataset](https://developers.google.com/maps/documentation/placesinsights/cloud-setup) in BigQuery sharing.\n", + "* **Google Maps Platform [API Key](https://developers.google.com/maps/get-started):** Required to render the final interactive map visualization. Enable the [**Maps JavaScript API**](https://developers.google.com/maps/documentation/javascript/get-api-key?setupProd=enable#enable-the-api) and [**Maps Tiles API**](https://developers.google.com/maps/documentation/tile/get-api-key?setupProd=enable#enable-the-api) on this key.\n", + "* **Colab Secrets:** Please add the following to the **Secrets** tab (Key icon on the left):\n", + " * `GCP_PROJECT_ID`: Your Google Cloud Project ID.\n", + " * `GMP_API_KEY`: The Google Maps API Key you configured in the previous step." + ], + "metadata": { + "id": "SO62kdnXf9vq" + } + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "HyozE39vag-O" + }, + "outputs": [], + "source": [ + "# @title 1. Setup & Authentication\n", + "import random\n", + "import requests\n", + "import pandas as pd\n", + "import geopandas as gpd\n", + "import folium\n", + "from folium import Element\n", + "from google.colab import auth, userdata\n", + "from google.cloud import bigquery\n", + "from shapely import wkt\n", + "\n", + "# 1. Retrieve Secrets\n", + "GCP_PROJECT_ID = userdata.get('GCP_PROJECT_ID').strip()\n", + "print(f\"βœ… Secrets retrieved for project: {GCP_PROJECT_ID}\")\n", + "GMP_API_KEY = userdata.get('GMP_API_KEY').strip()\n", + "print(f\"βœ… GMP API Key retrieved.\")\n", + "\n", + "# 2. Authenticate User\n", + "auth.authenticate_user(project_id=GCP_PROJECT_ID)\n", + "print(\"βœ… User Authenticated.\")\n", + "\n", + "# 3. Initialize BigQuery Client\n", + "client = bigquery.Client(project=GCP_PROJECT_ID)\n", + "print(\"βœ… BigQuery Client Initialized.\")" + ] + }, + { + "cell_type": "code", + "source": [ + "# @title 2. Generate Synthetic Data & Calculate Scores\n", + "# @markdown Note: This cell takes ~2 minutes to execute.\n", + "from shapely import wkt\n", + "\n", + "print(\"Generating synthetic locations in London...\")\n", + "\n", + "# 1. Generate Random Locations & Noise\n", + "LAT_MIN, LAT_MAX = 51.30, 51.70\n", + "LNG_MIN, LNG_MAX = -0.50, 0.30\n", + "\n", + "sql_structs = []\n", + "\n", + "for i in range(1, 501):\n", + " lng = random.uniform(LNG_MIN, LNG_MAX)\n", + " lat = random.uniform(LAT_MIN, LAT_MAX)\n", + " noise = random.gauss(0, 2)\n", + "\n", + " # STRUCT construction\n", + " sql_structs.append(\n", + " f\"STRUCT('STORE_{i:03d}' as store_id, ST_GEOGPOINT({lng:.4f}, {lat:.4f}) as location, {noise:.4f} as noise)\"\n", + " )\n", + "\n", + "generated_data_sql = \",\\n\".join(sql_structs)\n", + "\n", + "# 2. Construct Query\n", + "# We convert location to Text (ST_ASTEXT) to allow Grouping\n", + "query = f\"\"\"\n", + "WITH t AS (\n", + " SELECT * FROM UNNEST([\n", + " {generated_data_sql}\n", + " ])\n", + ")\n", + "SELECT WITH AGGREGATION_THRESHOLD\n", + " t.store_id,\n", + "\n", + " ST_ASTEXT(t.location) as location_wkt,\n", + "\n", + " -- Linear Model\n", + " GREATEST(0, LEAST(100,\n", + " 20 +\n", + " (0.2 * COUNTIF('gym' IN UNNEST(p.types))) +\n", + " (0.4 * COUNTIF('restaurant' IN UNNEST(p.types))) +\n", + " (0.1 * COUNTIF('school' IN UNNEST(p.types))) +\n", + " (0.1 * COUNTIF('transit_station' IN UNNEST(p.types))) +\n", + " (0.2 * COUNTIF('clothing_store' IN UNNEST(p.types))) +\n", + " AVG(t.noise)\n", + " )) AS store_performance\n", + "FROM\n", + " t\n", + "LEFT JOIN\n", + " `places_insights___gb.places` AS p\n", + " ON ST_DWITHIN(t.location, p.point, 500)\n", + " AND p.business_status = 'OPERATIONAL'\n", + "GROUP BY\n", + " t.store_id, location_wkt\n", + "\"\"\"\n", + "\n", + "print(\"Executing BigQuery GIS join...\")\n", + "\n", + "# 3. Execute to standard DataFrame (not GeoDataFrame yet)\n", + "df = client.query(query).to_dataframe()\n", + "\n", + "# 4. Convert WKT String back to Geometry object\n", + "df['geometry'] = df['location_wkt'].apply(wkt.loads)\n", + "\n", + "# 5. Convert to GeoDataFrame\n", + "df_stores = gpd.GeoDataFrame(df, geometry='geometry')\n", + "\n", + "# 6. Cleanup: Drop redundant text column and rename geometry to 'location'\n", + "# This matches the schema expected by BigQuery in the subsequent notebook.\n", + "df_stores = df_stores.drop(columns=['location_wkt'])\n", + "df_stores = df_stores.rename_geometry('location')\n", + "\n", + "print(f\"βœ… Generated and scored {len(df_stores)} stores.\")\n", + "display(df_stores.head())" + ], + "metadata": { + "id": "8nkGpuuQa5Ez" + }, + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "source": [ + "# @title 3. Maps backend Initialization: Session, Copyright & Assets\n", + "# @markdown This cell manages the API handshake. It performs the following steps:\n", + "# @markdown 1. **Session Creation:** Authenticates and requests a \"Roadmap\" session for the target region.\n", + "# @markdown 2. **Attribution Fetching:** Queries the API for the specific copyright text required for the configured viewport.\n", + "# @markdown 3. **Asset Preparation:** Generates the compliant HTML for the Google Maps logo overlay.\n", + "\n", + "# --- 1. Create Google Maps Session ---\n", + "print(\"πŸ—ΊοΈ Initializing Google Maps Session...\")\n", + "session_url = f\"https://tile.googleapis.com/v1/createSession?key={GMP_API_KEY}\"\n", + "headers = {\"Content-Type\": \"application/json\"}\n", + "payload = {\n", + " \"mapType\": \"roadmap\",\n", + " \"language\": \"en-GB\",\n", + " \"region\": \"GB\"\n", + "}\n", + "\n", + "try:\n", + " response = requests.post(session_url, json=payload, headers=headers)\n", + " response.raise_for_status()\n", + " session_token = response.json().get(\"session\")\n", + " print(f\"βœ… Session Token acquired.\")\n", + "except Exception as e:\n", + " raise RuntimeError(f\"Failed to initialize Google Maps session: {e}\")\n", + "\n", + "# --- 2. Fetch Dynamic Attribution for London ---\n", + "# Center of our synthetic data area\n", + "LAT, LNG = 51.5074, -0.1278\n", + "ZOOM_LEVEL = 11\n", + "delta = 0.2\n", + "\n", + "viewport_url = (\n", + " f\"https://tile.googleapis.com/tile/v1/viewport?key={GMP_API_KEY}\"\n", + " f\"&session={session_token}\"\n", + " f\"&zoom={ZOOM_LEVEL}\"\n", + " f\"&north={LAT + delta}&south={LAT - delta}\"\n", + " f\"&west={LNG - delta}&east={LNG + delta}\"\n", + ")\n", + "\n", + "try:\n", + " vp_response = requests.get(viewport_url)\n", + " vp_response.raise_for_status()\n", + " google_attribution = vp_response.json().get('copyright', 'Map data Β© Google')\n", + " print(\"βœ… Attribution fetched.\")\n", + "except Exception as e:\n", + " print(f\"⚠️ Warning: Could not fetch attribution ({e}). Defaulting.\")\n", + " google_attribution = \"Map data Β© Google\"\n", + "\n", + "# --- 3. Construct Logo HTML ---\n", + "logo_url = \"https://maps.gstatic.com/mapfiles/api-3/images/google_white3.png\"\n", + "logo_html = f\"\"\"\n", + "
\n", + " \"Google\n", + "
\n", + "\"\"\"\n", + "print(\"βœ… Logo HTML prepared.\")" + ], + "metadata": { + "cellView": "form", + "id": "-RHoNZHScZmQ" + }, + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "source": [ + "# @title 4. Display Map with Data Overlay\n", + "# @markdown Renders the interactive map with the following components:\n", + "# @markdown * **Google Maps Vector Tiles:** As the base layer.\n", + "# @markdown * **Store Data Overlay:** Points colored by their calculated `store_performance` score (Purple=Low, Yellow=High).\n", + "# @markdown * **Custom Controls:** A performance score legend and the required Google logo.\n", + "\n", + "# --- 1. Construct Tiles URL ---\n", + "tiles_url = f\"https://tile.googleapis.com/v1/2dtiles/{{z}}/{{x}}/{{y}}?session={session_token}&key={GMP_API_KEY}\"\n", + "\n", + "# --- 2. Initialize Map ---\n", + "m = folium.Map(\n", + " location=[LAT, LNG],\n", + " zoom_start=ZOOM_LEVEL,\n", + " tiles=tiles_url,\n", + " attr=google_attribution,\n", + " name=\"Google Maps\",\n", + " control_scale=True,\n", + " prefer_canvas=True\n", + ")\n", + "\n", + "# --- 3. Add Google Logo (Bottom Left) ---\n", + "m.get_root().html.add_child(Element(logo_html))\n", + "\n", + "# --- 4. Add Custom Legend/Key (Bottom Right) ---\n", + "# We use a CSS linear-gradient that matches the 'viridis' colormap used below\n", + "legend_html = \"\"\"\n", + "
\n", + " Performance Score\n", + "
Teal -> Green -> Yellow */\n", + " background: linear-gradient(to right, #440154, #3b528b, #21918c, #5ec962, #fde725);\n", + " margin-top: 8px;\n", + " margin-bottom: 4px;\n", + " \">
\n", + "
\n", + " Low (~20)\n", + " High (~80)\n", + "
\n", + "
\n", + "\"\"\"\n", + "m.get_root().html.add_child(Element(legend_html))\n", + "\n", + "# --- 5. Overlay Store Data ---\n", + "df_stores.explore(\n", + " m=m, # Add to our Google Map instance\n", + " column='store_performance',\n", + " vmin=20,\n", + " vmax=80,\n", + " scheme='NaturalBreaks',\n", + " marker_kwds={\"radius\": 8, \"fillOpacity\": 0.8},\n", + " cmap='viridis',\n", + " tooltip=['store_id', 'store_performance'],\n", + " name=\"Store Performance\"\n", + ")\n", + "\n", + "# Add layer control to toggle data on/off\n", + "folium.LayerControl().add_to(m)\n", + "\n", + "display(m)" + ], + "metadata": { + "cellView": "form", + "id": "gtLKIccGfBZX" + }, + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "source": [ + "# @title 5. Export Data to CSV\n", + "from google.colab import files\n", + "\n", + "# 1. Save DataFrame to CSV in the Colab virtual machine\n", + "filename = 'store_performance_london.csv'\n", + "df_stores.to_csv(filename, index=False)\n", + "print(f\"βœ… Saved {filename} to runtime.\")\n", + "\n", + "# 2. Trigger download to your local machine\n", + "files.download(filename)" + ], + "metadata": { + "id": "_AuX1wSncz-v" + }, + "execution_count": null, + "outputs": [] + } + ] +} \ No newline at end of file