EarthEmbeddingExplorer

EarthEmbeddingExplorer is an interactive web application for cross-modal retrieval of global satellite imagery. It allows you to search the Earth using natural language, images, or geographic coordinates — no need to download terabytes of data or write a single line of code.

Whether you are an AI researcher exploring vision-language models, a remote-sensing scientist looking for specific land-cover patterns, or simply curious about what satellite imagery can reveal, this tool is designed to be accessible and informative for everyone.

Overview

Imagine being able to type "a satellite image of a glacier" or "a city with a coastline" and instantly see matching locations on a world map, together with the actual satellite imagery. That is what EarthEmbeddingExplorer does.

Under the hood, the application encodes your query and ~249k satellite images into embedding vectors — compact numerical representations that capture semantic meaning. By measuring vector similarity, the system finds the most relevant images across the globe and visualizes them interactively.

Key features:

🔍 Text-to-Image Retrieval — Search satellite imagery with free-form natural language.
🖼️ Image-to-Image Retrieval — Upload a photo and find visually similar locations on Earth.
📍 Location-to-Image Retrieval — Input GPS coordinates or click on a map to discover what a specific place looks like from space.
⚡ Near Real-Time Search — Pre-computed embeddings and on-demand HTTP Range requests make retrieval fast without downloading the full dataset.
🌍 Global Coverage — Based on the MajorTOM dataset, covering the Earth's land surface with Sentinel-2 imagery.

How It Works

Satellite Imagery Dataset

We use MajorTOM (Major TOM: Expandable Datasets for Earth Observation), a large-scale dataset released by the European Space Agency (ESA). Specifically, we work with the Core-S2L2A-249k subset, which provides global Sentinel-2 Level-2A multispectral imagery at 10 m ground resolution.

Dataset	Source	Samples	Sensor
MajorTOM-Core-S2L2A	Sentinel-2 L2A	2,245,886	Multispectral (10 m)

Because the original tiles are large (1068 × 1068 pixels) and the full dataset exceeds 23 TB, we create a lightweight, search-friendly version:

Center Cropping — From each tile we extract the central 384 × 384 patch, which matches the input size expected by modern vision transformers.
Uniform Sampling — Using MajorTOM's hierarchical grid coding system, we sample roughly 1% of the data (~249k images). This preserves global geographic coverage while keeping the embedding index small enough for interactive search.

Geographic distribution of our sampled satellite-image embeddings.

Embedding Models

The retrieval engine is powered by four complementary embedding models. Think of them as different "encoders" that map images, text, or coordinates into a shared latent space. If you are coming from remote sensing, think of them as feature extractors that turn raw pixels (and optional metadata) into comparable signatures.

The four models we use:

Model	Modality	Training Data	Best For
SigLIP [3]	image + text	Natural image–text pairs (web)	General open-vocabulary text queries
FarSLIP [4]	image + text	Satellite image–text pairs (RS-specific)	Fine-grained remote-sensing concepts
SatCLIP [5]	image + location	Satellite image–GPS coordinate pairs	Location-aware retrieval
DINOv2 [7]	image only	Natural images (self-supervised)	Pure visual similarity search

SigLIP improves upon CLIP with a sigmoid loss and works well for everyday vocabulary.
FarSLIP is fine-tuned on remote-sensing captions, making it better at concepts like "deforestation" or "salt evaporation ponds".
SatCLIP jointly encodes images and their geographic coordinates, enabling queries like "show me places near (lat, lon)".
DINOv2 learns powerful visual features without any text supervision; it excels at "find me images that look like this one".

Models such as CLIP [2] learn to align images and text by training on massive pairs of (image, caption) data from the web. An image encoder compresses a photo into a vector; a text encoder does the same for a sentence. The key property is that semantically matching pairs end up close together in vector space, while unrelated pairs are far apart.

How contrastive learning connect images and texts/locations in a shared embedding space.

Turning satellite images into embedding vectors for fast similarity search.

Search pipeline:

We pre-compute embeddings for all ~249k sampled satellite images using each of the four models.
When you submit a query (text, image, or coordinates), the app encodes it with the corresponding model's encoder.
Cosine similarity scores are computed against the entire image index.
High-scoring locations are plotted on an interactive map, and the top-5 most similar images are fetched on demand.

System Architecture

EarthEmbeddingExplorer system architecture on ModelScope.

The application is designed for cloud-native deployment on ModelScope:

Models & embeddings are hosted on ModelScope (or Hugging Face) and downloaded on first use.
Raw imagery stays in remote Parquet shards. Each row of the embedding dataset contains the fields parquet_url and parquet_row, so once an embedding is retrieved, the system immediately knows which remote Parquet shard and which row contain the corresponding raw image—no extra index lookup is needed.
On-demand fetching uses HTTP Range requests to download only the necessary byte ranges (target row groups and the thumbnail column) from a Parquet file. This avoids downloading the full 23 TB dataset and enables near real-time image display.
GPU acceleration is provided by xGPU, allowing flexible allocation of GPU resources for encoding queries.

Datasets & Pre-computed Embeddings

Source Imagery

Dataset	Link	Description
Core-S2L2A-249k	ModelScope	Sampled subset of MajorTOM Core-S2L2A used in this project

Pre-computed Embedding Datasets

Each embedding dataset contains the vector representation of every sampled image, together with metadata (grid_cell, parquet_url, parquet_row) needed to retrieve the original pixels on demand.

Model	Embedding Dataset	Link
SigLIP	Core-S2RGB-249k-SigLIP	ModelScope
FarSLIP	Core-S2RGB-249k-FarSLIP	ModelScope
DINOv2	Core-S2RGB-249k-DINOv2	ModelScope
SatCLIP	Core-S2RGB-249k-SatCLIP	ModelScope

Note for developers: The parquet_url field stores a direct HuggingFace URL (e.g., https://huggingface.co/datasets/Major-TOM/Core-S2L2A/resolve/main/images/part_00001.parquet) and parquet_row stores the global row index, enabling online image download when the app is deployed on ModelScope or Hugging Face Spaces.

Quick Start

Use on ModelScope Studio

EarthEmbeddingExplorer is hosted live on ModelScope Studio (modelscope.cn for China, modelscope.ai for international users) and Hugging Face Spaces. We recommend ModelScope for the smoothest experience: it provides free GPU resources and optimized bandwidth for downloading imagery directly from the Parquet shards.

Local Deployment

# 1. Clone the repository
git clone https://github.com/VoyagerX/EarthEmbeddingExplorer.git
cd EarthEmbeddingExplorer

# 2. Install dependencies
pip install -r requirements.txt

# 3. Launch the app
python app.py

By default the app downloads models and embeddings from ModelScope. You can switch the download endpoint via the environment variable:

export DOWNLOAD_ENDPOINT="modelscope.cn"  # China users (fastest domestic access)
# export DOWNLOAD_ENDPOINT="modelscope.ai"  # International users (ModelScope global)
# export DOWNLOAD_ENDPOINT="huggingface"    # International users (Hugging Face)
python app.py

Tip: If you are in mainland China, use modelscope.cn for the fastest download speeds. International users should use modelscope.ai or huggingface.

Examples

Text Search

Type a natural-language description and see matching locations worldwide.

Query: "a satellite image of a glacier"

Image Search

Upload an image and retrieve geographically diverse locations that share visual similarity.

Query image → similar satellite locations in the Amazon region.

Location Search

Click on the map or enter GPS coordinates to discover what that place looks like from space.

Location query near the Amazon basin.

Insights from Cross-Modal Retrieval

EarthEmbeddingExplorer doubles as a diagnostic tool for embedding models.Comparing how different encoders respond to the same query quickly reveals domain gaps and geographic biases invisible to static benchmarks. For example:

Domain gap. SigLIP, trained on everyday web images, often stumbles on geoscientific terms (e.g., "glacier", "salt evaporation ponds"). FarSLIP closes this gap via RS-specific fine-tuning, yet then underperforms on generic non-RS queries—exposing a specialization–generality trade-off.
Geographic bias. Side-by-side maps show uneven global priors. For "snow covered mountains", FarSLIP concentrates on Asia’s high-elevation belts (Himalayas, Kunlun, Tianshan), while SigLIP favors the Andes and New Zealand’s Southern Alps. For "glacier", FarSLIP retrieves polar and Antarctic regions, whereas SigLIP omits Antarctica—likely because polar imagery is absent from its pre-training corpus. Even well-specified prompts occasionally return mismatched patches (e.g., ocean tiles for land-cover concepts), pointing to limited geographic awareness in current embedding spaces.

For more details, please check our tutorial paper on arXiv.

Roadmap

Integrate FAISS for faster approximate nearest-neighbor search.
Support additional embedding models and datasets.
Increase the coverage of embedding datasets.
What features do you want? Leave an issue or start a discussion!

We warmly welcome new contributors. See CONTRIBUTING.md for guidelines on generating a new embedding dataset and submitting a pr.

Acknowledgements

We thank the following open-source projects and datasets that made EarthEmbeddingExplorer possible:

Models:

SigLIP — Vision Transformer for image-text alignment
FarSLIP — Fine-grained remote-sensing language-image pretraining
SatCLIP — Satellite location-image pretraining
DINOv2 — Self-supervised vision transformer

Datasets:

MajorTOM — Expandable datasets for Earth observation by ESA

We are grateful to the research communities and organizations that developed and shared these resources.

Citation

If you use EarthEmbeddingExplorer in your research, please cite:

@inproceedings{
  zheng2026earthembeddingexplorer,
  title={EarthEmbeddingExplorer: A Web Application for Cross-Modal Retrieval of Global Satellite Images},
  author={Yijie Zheng and Weijie Wu and Bingyue Wu and Long Zhao and Guoqing Li and Mikolaj Czerkawski and Konstantin Klemmer},
  booktitle={4th ICLR Workshop on Machine Learning for Remote Sensing (Tutorial Track)},
  year={2026},
  url={https://openreview.net/forum?id=LSsEenJVqD}
}

References

[1] Francis, A., & Czerkawski, M. (2024). Major TOM: Expandable Datasets for Earth Observation. IGARSS 2024.

[2] Radford, A., et al. (2021). Learning Transferable Visual Models From Natural Language Supervision. ICML 2021.

[3] Zhai, X., et al. (2023). Sigmoid Loss for Language-Image Pre-Training. ICCV 2023.

[4] Li, Z., et al. (2025). FarSLIP: Discovering Effective CLIP Adaptation for Fine-Grained Remote Sensing Understanding. arXiv 2025.

[5] Klemmer, K., et al. (2025). SatCLIP: Global, General-Purpose Location Embeddings with Satellite Imagery. AAAI 2025.

[6] Czerkawski, M., Kluczek, M., & Bojanowski, J. S. (2024). Global and Dense Embeddings of Earth: Major TOM Floating in the Latent Space. arXiv preprint arXiv:2412.05600.

[7] Oquab, M., et al. (2023). DINOv2: Learning Robust Visual Features without Supervision. arXiv preprint arXiv:2304.07193.

[8] Zheng, Y., et al. (2026). EarthEmbeddingExplorer: A Web Application for Cross-Modal Retrieval of Global Satellite Images. 4th ICLR Workshop on ML4RS (Tutorial Track).

Name		Name	Last commit message	Last commit date
Latest commit History 71 Commits
MajorTOM		MajorTOM
configs		configs
core		core
examples		examples
images		images
models		models
ui		ui
.gitattributes		.gitattributes
.gitignore		.gitignore
CONTRIBUTING.md		CONTRIBUTING.md
README.md		README.md
app.py		app.py
data_utils.py		data_utils.py
doc.md		doc.md
doc_zh.md		doc_zh.md
generate_embeddings.py		generate_embeddings.py
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
visualize.py		visualize.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

EarthEmbeddingExplorer

Table of Contents

Overview

How It Works

Satellite Imagery Dataset

Embedding Models

System Architecture

Datasets & Pre-computed Embeddings

Source Imagery

Pre-computed Embedding Datasets

Quick Start

Use on ModelScope Studio

Local Deployment

Examples

Text Search

Image Search

Location Search

Insights from Cross-Modal Retrieval

Roadmap

Acknowledgements

Citation

References

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

EarthEmbeddingExplorer

Table of Contents

Overview

How It Works

Satellite Imagery Dataset

Embedding Models

System Architecture

Datasets & Pre-computed Embeddings

Source Imagery

Pre-computed Embedding Datasets

Quick Start

Use on ModelScope Studio

Local Deployment

Examples

Text Search

Image Search

Location Search

Insights from Cross-Modal Retrieval

Roadmap

Acknowledgements

Citation

References

About

Resources

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages