Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
4 changes: 2 additions & 2 deletions .github/workflows/docs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -20,8 +20,8 @@ jobs:
- uses: actions/setup-python@v5
with:
python-version: 3.x
- run: pip install zensical
- run: zensical build --clean
- run: pip install zensical mkdocs-material mkdocstrings-python
- run: zensical build --clean
- uses: actions/upload-pages-artifact@v4
with:
path: site
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/run_tests.yml
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@ jobs:
fail-fast: false
matrix:
os: ["ubuntu-latest"]
python: ["3.11"]
python: ["3.11", "3.12", "3.13"]
defaults:
run:
shell: "bash -eo pipefail {0}"
Expand Down
6 changes: 6 additions & 0 deletions Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -30,3 +30,9 @@ deps:

docs:
zensical build

serve:
python -mpanel serve dashboard/cleanup_dashboard.py --port 60002

poster:
python -mpanel serve scripts/poster.py --autoreload
15 changes: 13 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,8 @@
# IOC Cleanup

`ioc_cleanup` provides a
Check the online [documentation](https://oceanmodeling.github.io/ioc_cleanup/)

`ioc_cleanup` provides a
* reproducible
* transparent,
* and traceable
Expand All @@ -13,4 +15,13 @@ workflow for cleaning tide gauge (sea level) data from IOC (Intergovernmental Oc
## Use the dashboard
![demo](./docs/assets/dashboard_light.png)

Check the [docs](https://tomsail.github.io/ioc_cleanup/)
## How to cite

```
Saillour, T. and Mavrogiorgos, P.: Reproducible, transparent and traceable cleaning of IOC Tide Gauge Data, EGU General Assembly 2026, Vienna, Austria, 3–8 May 2026, EGU26-7777, https://doi.org/10.5194/egusphere-egu26-7777, 2026.
```

## Poster
A reproducible poster has been created for the EGU 2026 conference (see DOI above).

You can consult it directly in the [poster section](https://oceanmodeling.github.io/ioc_cleanup/poster/) in the docs.
69 changes: 69 additions & 0 deletions docs/assets/cleaned_map.html

Large diffs are not rendered by default.

68 changes: 68 additions & 0 deletions docs/assets/coverage_oceans.html

Large diffs are not rendered by default.

69 changes: 69 additions & 0 deletions docs/assets/data_availability_hist.html

Large diffs are not rendered by default.

69 changes: 69 additions & 0 deletions docs/assets/data_availability_map.html

Large diffs are not rendered by default.

69 changes: 69 additions & 0 deletions docs/assets/data_removed_hist.html

Large diffs are not rendered by default.

69 changes: 69 additions & 0 deletions docs/assets/data_removed_map.html

Large diffs are not rendered by default.

10 changes: 5 additions & 5 deletions docs/example.html → docs/assets/example.html

Large diffs are not rendered by default.

68 changes: 68 additions & 0 deletions docs/assets/example_clean.html

Large diffs are not rendered by default.

68 changes: 68 additions & 0 deletions docs/assets/flat.html

Large diffs are not rendered by default.

69 changes: 69 additions & 0 deletions docs/assets/kamchatka_map.html

Large diffs are not rendered by default.

10 changes: 5 additions & 5 deletions docs/noise.html → docs/assets/noise.html

Large diffs are not rendered by default.

71 changes: 71 additions & 0 deletions docs/assets/poster.html

Large diffs are not rendered by default.

Binary file added docs/assets/poster.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
10 changes: 5 additions & 5 deletions docs/seiche.html → docs/assets/seiche.html

Large diffs are not rendered by default.

10 changes: 5 additions & 5 deletions docs/spikes.html → docs/assets/spikes.html

Large diffs are not rendered by default.

10 changes: 5 additions & 5 deletions docs/step_long.html → docs/assets/step_long.html

Large diffs are not rendered by default.

10 changes: 5 additions & 5 deletions docs/step_simple.html → docs/assets/step_simple.html

Large diffs are not rendered by default.

69 changes: 69 additions & 0 deletions docs/assets/tonga_map.html

Large diffs are not rendered by default.

10 changes: 5 additions & 5 deletions docs/tsunami.html → docs/assets/tsunami.html

Large diffs are not rendered by default.

10 changes: 5 additions & 5 deletions docs/tsunami_detided.html → docs/assets/tsunami_detided.html

Large diffs are not rendered by default.

10 changes: 5 additions & 5 deletions docs/unknown.html → docs/assets/unknown.html

Large diffs are not rendered by default.

69 changes: 0 additions & 69 deletions docs/cleaned_map.html

This file was deleted.

69 changes: 68 additions & 1 deletion docs/concept.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Motivation
## Why `ioc_cleanup`?

Cleaning tide gauge data is often:

Expand Down Expand Up @@ -57,3 +57,70 @@ declaring:
- notes and metadata

More details in the [JSON format](./reference/json-schema.md)


## Dataset Details

The following figures have been generated with the helper functions in `scripts/` folder:

1. `download_ioc.py` to download IOC stations
2. `generate_maps.py` to create maps and graphs for the online documentation
3. `save_cleaning_scenarios.py` to create the time series graphs used in the online documentation

Steps 2 and 3 require to have run step 1 for all cleaned IOC stations.

The cleaned stations dataset can be retrieved with:

```python
import ioc_cleanup as C
ioc = C.get_meta()
stats = C.calc_statistics(ioc, stations_dir=C.TRANSFORMATIONS_DIR, pattern="*.json")
```

### Cleaned Stations
<iframe
src="./assets/cleaned_map.html"
width="100%"
height="740"
style="border:none;">
</iframe>

### Coverage across oceans
<iframe
src="./assets/coverage_oceans.html"
width="100%"
height="330"
style="border:none;">
</iframe>

### Data availability in the 2020 - 2025 period

<iframe
src="./assets/data_availability_map.html"
width="100%"
height="590"
style="border:none;">
</iframe>

<iframe
src="./assets/data_availability_hist.html"
width="100%"
height="320"
style="border:none;">
</iframe>

### Ratio of data removed in the 2020 - 2025 period

<iframe
src="./assets/data_removed_map.html"
width="100%"
height="590"
style="border:none;">
</iframe>

<iframe
src="./assets/data_removed_hist.html"
width="100%"
height="320"
style="border:none;">
</iframe>
8 changes: 5 additions & 3 deletions docs/contributing.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,12 +5,14 @@ Contributions are very welcome!
## How to contribute

1. Fork the repository
2. Add or update a JSON transformation file
3. Use the dashboard to clean or flag data
2. Add or update a [JSON transformation file](./reference/json-schema.md)
3. Use the [dashboard](./workflows/dashboard.md) to clean or flag data
4. Submit a pull request with a clear description of your changes

## Areas for improvement

* Publication and doi for clean dataset (WIP)
* Presentation at [EGU 2026](https://doi.org/10.5194/egusphere-egu26-7777)
* Publication of the cleaned dataset
* Add more IOC stations
* Extend the cleaned time range (currently 2020–2025)
* Others areas of improvement? Please open an [issue](https://github.com/oceanmodeling/ioc_cleanup)!
68 changes: 0 additions & 68 deletions docs/example_clean.html

This file was deleted.

51 changes: 39 additions & 12 deletions docs/guidelines.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ This repository does NOT contain IOC data and does not manage data acquisition.
In some cases, cleaning is easy and is just about removing spikes
### Spikes
<iframe
src="../spikes.html"
src="./assets/spikes.html"
width="100%"
height="710"
style="border:none;">
Expand All @@ -26,15 +26,15 @@ In some cases, cleaning is easy and is just about removing spikes

See details on the [JSON structure](reference/json-schema/)

### Noise vs. Physical phenomena
### Numerical vs. Physical phenomena

It becomes more difficult when it comes to distinguishing noise (either numerical or physical e.g. boat wakes) from real physical events (like harbour seiches or tsunamis).


#### Physical - Seiches
Here an example of what seems to be a harbour seiche in `LA23` - Lampedusa station (IT):
<iframe
src="../seiche.html"
src="./assets/seiche.html"
width="100%"
height="710"
style="border:none;">
Expand All @@ -47,7 +47,7 @@ Here an example of what seems to be a harbour seiche in `LA23` - Lampedusa stati
Here is the 2025 Kamchatka Peninsula Tsunami captured by `cres` - Crescent City station (CA, USA):

<iframe
src="../tsunami.html"
src="./assets/tsunami.html"
width="100%"
height="710"
style="border:none;">
Expand All @@ -57,7 +57,7 @@ Here is the 2025 Kamchatka Peninsula Tsunami captured by `cres` - Crescent City
Same tsunami and station, detided:

<iframe
src="../tsunami_detided.html"
src="./assets/tsunami_detided.html"
width="100%"
height="710"
style="border:none;">
Expand All @@ -68,11 +68,11 @@ Same tsunami and station, detided:

See details on the [JSON structure](reference/json-schema/)

#### Noise - Numerical
#### Numerical - Noise
In some case, numerical noise is easy to isolate like for this station:

<iframe
src="../example.html"
src="./assets/example.html"
width="100%"
height="710"
style="border:none;">
Expand All @@ -83,7 +83,24 @@ In some case, numerical noise is easy to isolate like for this station:

See details on the [JSON structure](reference/json-schema/)

#### Noise - Unknown
#### Numerical - Flat signal
Some stations can have parts of flat signal.

<iframe
src="./assets/flat.html"
width="100%"
height="710"
style="border:none;">
</iframe>

!!! tip "Advice for flat signal"
Remove flat parts from the data.

If the flat parts are long enough and easy to isolate, select the flat ranges and paste them in the `dropped_date_ranges`. If it is too complicated (like in this example), you can select multiple part of the data and paste in `dropped_timestamps`.

See details on the [JSON structure](reference/json-schema/)

#### Numerical - Unknown
In other case, the nature of the noise is difficult to identify. There could be lots of reasons:

* physical induced noise:
Expand All @@ -96,15 +113,15 @@ In other case, the nature of the noise is difficult to identify. There could be

##### Example 1
<iframe
src="../noise.html"
src="./assets/noise.html"
width="100%"
height="710"
style="border:none;">
</iframe>

##### Example 2
<iframe
src="../unknown.html"
src="./assets/unknown.html"
width="100%"
height="710"
style="border:none;">
Expand All @@ -120,7 +137,7 @@ In other case, the nature of the noise is difficult to identify. There could be
Some steps are easy to isolate and deal with. A recurrent error found on tidal stations occurs during DST (Daylight saving time) changes:

<iframe
src="../step_simple.html"
src="./assets/step_simple.html"
width="100%"
height="710"
style="border:none;">
Expand All @@ -140,7 +157,7 @@ Some steps - or offsets - can be caused by mulitple reasons:
* any ohter unkonwn reason

<iframe
src="../step_long.html"
src="./assets/step_long.html"
width="100%"
height="710"
style="border:none;">
Expand Down Expand Up @@ -179,3 +196,13 @@ Some steps - or offsets - can be caused by mulitple reasons:
!!! warning "Subjectivity"
* Cleaning decisions are inherently subjective
* Different operators may disagree on what should be discarded

### De-tiding

!!! warning "Chunks length for de-tiding"
Although `ioc_cleanup` does not directly tackle the de-tiding problem, it leverages de-tiding methods to better isolate and flag bad data on the tide gauges.

Here are some resources on this matter:

* Github discussion on [Utide](https://github.com/orgs/oceanmodeling/discussions/25)
* Detiding Theory and Practices, cited in the [Book of tides](https://www.researchgate.net/publication/280722791_De-tiding_Theory_and_practice)
6 changes: 3 additions & 3 deletions docs/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ for cleaning tide gauge data from [IOC](https://www.ioc-sealevelmonitoring.org/l
All stations with clean data between 1st of January 2020 and the 31st of december 2025.

<iframe
src="cleaned_map.html"
src="assets/cleaned_map.html"
width="100%"
height="740"
style="border:none;">
Expand Down Expand Up @@ -47,7 +47,7 @@ df_clean = C.transform(df_raw, trans)

### From raw signal...
<iframe
src="example.html"
src="assets/example.html"
width="100%"
height="710"
style="border:none;">
Expand All @@ -57,7 +57,7 @@ df_clean = C.transform(df_raw, trans)

### ... to clean signal
<iframe
src="example_clean.html"
src="assets/example_clean.html"
width="100%"
height="710"
style="border:none;">
Expand Down
11 changes: 11 additions & 0 deletions docs/poster.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
A reproducible poster has been created for the EGU 2026 conference ([doi](https://doi.org/10.5194/egusphere-egu26-7777)).

To reproduce the poster:
```bash
git clone https://github.com/oceanmodeling/ioc_cleanup.git
cd ioc_cleanup
pip install -r requirements/requirements.txt
make poster
```

You can also consult it directly [here](https://tomsail.github.io/ioc_cleanup/assets/poster.html)
37 changes: 37 additions & 0 deletions docs/tsunami.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
# Tsunami events

Over the 2020 - 2026 period, major tsunamis have been registered and flagged in `ioc_cleanup`:

## 2022 Tonga eruption
[Wikipedia link](https://en.wikipedia.org/wiki/2022_Hunga_Tonga%E2%80%93Hunga_Ha%CA%BBapai_eruption_and_tsunami)

<iframe
src="./assets/tonga_map.html"
width="100%"
height="430"
style="border:none;">
</iframe>

## 2025 Kamchatka earthquake
[Wikipedia link](https://en.wikipedia.org/wiki/2025_Kamchatka_earthquake)


Some X stations recorded the Kamchatka tsunami on the 30th July 2025.

<iframe
src="./assets/kamchatka_map.html"
width="100%"
height="320"
style="border:none;">
</iframe>

Here is the distribution of waves heihgts recorded:

and the highest wave (2.26m) recorded in Crescent city (California)

<iframe
src="./assets/tsunami_detided.html"
width="100%"
height="710"
style="border:none;">
</iframe>
69 changes: 0 additions & 69 deletions docs/tsunami_map.html

This file was deleted.

Loading
Loading