Skip to content

Commit 37ab5c3

Browse files
authored
[FSTORE-1921] Improve documentation building process (#532)
* Attempt to use multirepo-mkdocs and mkdocstrings * Add dark theme * Style fixes * Import API nav * Fix spacing of nested nav items * Hide full paths * Prepare for multirepo builds * Add markdownlint config and fixes * Fix * Place a sentence per line That is, replace ([^0-9])\. ([A-Z]) with $1.\n$2 plus process exceptions. * Fix CSS * Fix macro * Fix dark theme CSS * More fixes * Upodate markdownlint action * Fix tables * Fix release action * Add repo * Fix header z-index * Fix mobile * Fix search overlay * Fix sidebar overlay * Fix branch * Update GitHub links * Start fixing {{{hopsworks_version}}} links * Fix local links * Use https * Fix links * Update deps * Fix deps * Fix mkdocs.yaml * Copy CONTRIBUTING.md * Fix style * Remove redandant code * Fix links * Fix pydantic inv * Fix more links * Fix a link * Fix missing docs * Remove redundant, fix spine * Fix annotations and polars imports * Final change * Final fixes * Fix z-index back * Pin versions * Update deps * Figure out the latest version automatically * Hide databricks docs * Deploy dev docs * Mention the development version
1 parent 8bc5799 commit 37ab5c3

File tree

209 files changed

+4802
-3133
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

209 files changed

+4802
-3133
lines changed

.github/workflows/mkdocs-release.yml

Lines changed: 91 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,13 @@ name: mkdocs-release
22

33
on:
44
push:
5-
branches: [branch-*\.*]
5+
branches:
6+
- branch-*
7+
- main
8+
9+
repository_dispatch:
10+
types:
11+
- trigger-rebuild
612

713
concurrency:
814
group: ${{ github.workflow }}
@@ -13,25 +19,100 @@ jobs:
1319
runs-on: ubuntu-latest
1420

1521
steps:
16-
- uses: actions/checkout@v4
22+
- name: Extract branch name (push)
23+
if: ${{ github.event_name == 'push' }}
24+
run: echo "BRANCH=${GITHUB_REF#refs/heads/}" >> "$GITHUB_ENV"
25+
26+
- name: Extract branch name (repository_dispatch)
27+
if: ${{ github.event_name == 'repository_dispatch' }}
28+
run: echo "BRANCH=${{ github.event.client_payload.branch }}" >> "$GITHUB_ENV"
29+
30+
- name: Extract version from branch name
31+
if: ${{ env.BRANCH != 'main' }}
32+
run: echo "HOPSWORKS_VERSION=${BRANCH#branch-}" >> "$GITHUB_ENV"
33+
34+
- name: Is latest release?
35+
id: is-latest
36+
uses: actions/github-script@v8
37+
if: ${{ env.BRANCH != 'main' }}
38+
with:
39+
script: |
40+
const branches_url = context.payload.repository.branches_url
41+
const branches_url_new = branches_url.replace("{/branch}", "")
42+
const result = await github.request(branches_url_new)
43+
const names = result.data.map(branch => branch.name)
44+
const versions = names.filter(name => name.startsWith('branch-')).map(name => name.replace('branch-', ''))
45+
const minorLength = Math.max(...versions.map(v => v.split('.')[1].length))
46+
const convertVersionToNumber = (version) => {
47+
const parts = version.split('.').map(Number)
48+
return parts[0]*10**minorLength + parts[1]
49+
}
50+
51+
return Math.max(...versions.map(convertVersionToNumber)) === convertVersionToNumber(process.env.HOPSWORKS_VERSION)
52+
53+
- name: Checkout main repo
54+
uses: actions/checkout@v4
1755
with:
1856
fetch-depth: 0
57+
ref: ${{ env.BRANCH }}
58+
59+
- name: Checkout the API repo
60+
uses: actions/checkout@v4
61+
with:
62+
repository: logicalclocks/hopsworks-api
63+
ref: ${{ env.BRANCH }}
64+
path: hopsworks-api
65+
66+
- name: Cache local Maven repository
67+
uses: actions/cache@v4
68+
with:
69+
path: ~/.m2/repository
70+
key: ${{ runner.os }}-maven-${{ hashFiles('java/pom.xml') }}
71+
restore-keys: |
72+
${{ runner.os }}-maven-
73+
74+
- name: Set up JDK 8
75+
uses: actions/setup-java@v5
76+
with:
77+
java-version: "8"
78+
distribution: "adopt"
79+
80+
- name: Build javadoc documentation
81+
working-directory: hopsworks-api/java
82+
run: mvn clean install javadoc:javadoc javadoc:aggregate -DskipTests && cp -r target/site/apidocs ../../docs/javadoc
1983

2084
- uses: actions/setup-python@v5
2185
with:
2286
python-version: "3.10"
2387

24-
- name: Install ubuntu dependencies
25-
run: sudo apt update && sudo apt-get install -y libxml2-dev libxslt-dev
88+
- name: Install uv
89+
uses: astral-sh/setup-uv@v7
90+
with:
91+
activate-environment: true
92+
working-directory: hopsworks-api/python
2693

27-
- name: install deps
28-
run: pip3 install -r requirements-docs.txt
94+
- name: Install Python API dependencies
95+
run: uv sync --extra dev --group docs --project hopsworks-api/python
2996

30-
- name: setup git
97+
- name: Install Python dependencies
98+
run: uv pip install -r requirements-docs.txt
99+
100+
- name: Install Ubuntu dependencies
101+
run: sudo apt update && sudo apt-get install -y libxml2-dev libxslt-dev
102+
103+
- name: Setup git for mike
31104
run: |
32105
git config --global user.name Mike
33106
git config --global user.email [email protected]
34107
35-
# Put this back and increment version when cutting a new release branch
36-
# - name: mike deploy docs
37-
# run: mike deploy 3.0 latest -u --push
108+
- name: Deploy the dev docs with mike
109+
if: ${{ env.BRANCH == 'main' }}
110+
run: mike deploy dev --set-prop hidden=true --push
111+
112+
- name: Deploy the release docs with mike
113+
if: ${{ env.BRANCH != 'main' }}
114+
run: mike deploy ${HOPSWORKS_VERSION} --push
115+
116+
- name: Update latest docs if needed
117+
if: ${{ env.BRANCH != 'main' && steps.is-latest.outputs.result == 'true' }}
118+
run: mike alias ${HOPSWORKS_VERSION} latest --update-aliases --push

.github/workflows/mkdocs-test.yml

Lines changed: 50 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -12,25 +12,59 @@ jobs:
1212
with:
1313
fetch-depth: 0
1414

15+
- name: Checkout the API repo
16+
uses: actions/checkout@v4
17+
with:
18+
repository: logicalclocks/hopsworks-api
19+
ref: ${{ github.base_ref }}
20+
path: hopsworks-api
21+
22+
- name: Markdownlint
23+
uses: DavidAnson/markdownlint-cli2-action@v21
24+
with:
25+
globs: '**/*.md'
26+
27+
- name: Cache local Maven repository
28+
uses: actions/cache@v4
29+
with:
30+
path: ~/.m2/repository
31+
key: ${{ runner.os }}-maven-${{ hashFiles('java/pom.xml') }}
32+
restore-keys: |
33+
${{ runner.os }}-maven-
34+
35+
- name: Set up JDK 8
36+
uses: actions/setup-java@v5
37+
with:
38+
java-version: "8"
39+
distribution: "adopt"
40+
41+
- name: Build javadoc documentation
42+
working-directory: hopsworks-api/java
43+
run: mvn clean install javadoc:javadoc javadoc:aggregate -DskipTests && cp -r target/site/apidocs ../../docs/javadoc
44+
1545
- uses: actions/setup-python@v5
1646
with:
1747
python-version: "3.10"
1848

19-
- name: Install ubuntu dependencies
20-
run: sudo apt update && sudo apt-get install -y libxml2-dev libxslt-dev
49+
- name: Install uv
50+
uses: astral-sh/setup-uv@v7
51+
with:
52+
activate-environment: true
53+
working-directory: hopsworks-api/python
2154

22-
- name: install deps
23-
run: pip3 install -r requirements-docs.txt
55+
- name: Install Python API dependencies
56+
run: uv sync --extra dev --group docs --project hopsworks-api/python
2457

25-
- name: setup git
26-
run: |
27-
git config --global user.name Mike
28-
git config --global user.email [email protected]
58+
- name: Install Python dependencies
59+
run: uv pip install -r requirements-docs.txt
60+
61+
- name: Install Ubuntu dependencies
62+
run: sudo apt update && sudo apt-get install -y libxml2-dev libxslt-dev
2963

30-
- name: test broken links
64+
- name: Check for broken links
3165
run: |
3266
# run the server
33-
mkdocs serve > /dev/null 2>&1 &
67+
mkdocs serve > /dev/null 2>&1 &
3468
SERVER_PID=$!
3569
echo "mk server in PID $SERVER_PID"
3670
# Give enough time for deployment
@@ -41,5 +75,10 @@ jobs:
4175
# If ok just kill the server
4276
kill -9 $SERVER_PID
4377
44-
- name: mike deploy docs
78+
- name: Setup git for mike
79+
run: |
80+
git config --global user.name Mike
81+
git config --global user.email [email protected]
82+
83+
- name: Generate the docs with mike
4584
run: mike deploy 3.2-SNAPSHOT dev -u

.gitignore

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -128,3 +128,4 @@ target/
128128
# Mac
129129
.DS_Store
130130

131+
/temp_dir

.markdownlint.yaml

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,8 @@
1+
MD041: false
2+
MD013: false
3+
MD033: false
4+
MD045: false
5+
MD046: false
6+
MD052: false
7+
MD004:
8+
style: dash

README.md

Lines changed: 43 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -1,45 +1,70 @@
1-
# Documentation landing page
1+
# Hopsworks Documentation
22

3-
This is the source of the landing page for https://docs.hopsworks.ai
3+
This is the source of the Hopsworks Documentation published at <https://docs.hopsworks.ai>.
44

55
## Build instructions
66

7-
### Step 1: Setup python environment
7+
We use `mkdocs` together with [`mike`]((https://github.com/jimporter/mike/) for versioning to build the documentation.
8+
We also use this two main mkdocs plugins: [`mkdocstrings`](https://mkdocstrings.github.io/) and [its Python handler](https://mkdocstrings.github.io/python/), and [`mkdocs-material`](https://squidfunk.github.io/mkdocs-material/) as the theme.
89

9-
Create a python 3.10 environment, using a python environment manager of your own choosing. For example `virtualenv` or `anaconda`.
10+
**Background about `mike`:**
11+
`mike` builds the documentation and commits it as a new directory to the `gh-pages` branch.
12+
Each directory corresponds to one version of the documentation.
13+
Additionally, `mike` maintains a json in the root of `gh-pages` with the mappings of versions/aliases for each of the directories available.
14+
With aliases, you can define extra names like `dev` or `latest`, to indicate stable and unstable releases.
1015

11-
### Step 2
16+
### Versioning on docs.hopsworks.ai
17+
18+
On docs.hopsworks.ai we implement the following versioning scheme:
19+
20+
- the latest release: rendered with full current version, e.g. **4.4 [latest]** with `latest` alias to indicate that this is the latest stable release.
21+
- previous stable releases: rendered without alias, e.g. **4.3**.
22+
- development version: it is built using main branches and is hidden from the version selector, but it is still accessible at <https://docs.hopsworks.ai/dev>.
23+
24+
### Step 1
1225

13-
Clone this repository
26+
Clone this repository:
1427

1528
```bash
1629
git clone https://github.com/logicalclocks/logicalclocks.github.io.git
1730
```
1831

19-
### Step 3
20-
21-
Install the required dependencies to build the documentation in the python environment created in the previous step.
32+
### Step 2
2233

23-
**Note that {PY_ENV} is the path to your python environment.**
34+
Create a python virtual environment to build the documentation:
2435

2536
```bash
26-
cd logicalclocks.github.io
27-
{PY_ENV}/bin/pip3 install -r requirements-docs.txt
37+
uv venv
38+
uv pip install -r requirements-docs.txt
39+
# Install hopsworks-api for gathering docstrings for the API reference
40+
uv pip install git+https://github.com/logicalclocks/hopsworks-api.git@main#subdirectory=python
2841
```
2942

30-
### Step 4
43+
Alternatively, you can just activate the virtual environment you use for development of `hopsworks-api` (obtained via `uv sync`), this is the way it is done in the actions.
44+
Namely, in `.github/workflows/mkdocs-release.yml` and `.github/workflows/mkdocs-test.yml`, the `hopsworks-api` repo is cloned, and its uv virtual environment is used with `dev` extra and all development groups.
3145

32-
Use mkdocs to build the documentation and serve it locally
46+
A callback is set in `hopsworks-api` GitHub Actions, which triggers `.github/workflows/mkdocs-release.yml` on any pushes to release branches (that is, `branch-x.x`).
47+
48+
### Step 3
49+
50+
Build and serve the docs using mike.
3351

3452
```bash
35-
{PY_ENV}/bin/mkdocs serve
53+
# Use the current version instead of 4.4:
54+
mike deploy 4.4 latest --update-alias
55+
# Next, serve the docs to access them locally:
56+
mike serve
3657
```
3758

38-
The documentation should now be available locally on the following URL: http://127.0.0.1:8000/
59+
**Important**: The first time you serve the docs, you have to choose a default version, as follows:
60+
61+
```bash
62+
mike set-default latest
63+
```
3964

4065
## Adding new pages
4166

42-
The `mkdocs.yml` file of this repository defines the pages to show in the navigation.
67+
The `mkdocs.yml` file of this repository defines the pages to show in the navigation.
4368
After adding your new page in the docs folder, you also need to add it to this file for it to show up in the navigation.
4469

4570
## Checking links
@@ -56,4 +81,4 @@ linkchecker http://127.0.0.1:8000/
5681

5782
# If ok just kill the server
5883
kill -9 $SERVER_PID
59-
```
84+
```

docs/concepts/dev/inside.md

Lines changed: 27 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -1,34 +1,46 @@
1-
Hopsworks provides a complete self-service development environment for feature engineering and model training. You can develop programs as Jupyter notebooks or jobs, customize the bundled FTI (feature, training and inference pipeline) python environments, you can manage your source code with Git, and you can orchestrate jobs with Airflow.
21

3-
<img src="../../../assets/images/concepts/dev/dev-inside.svg">
2+
Hopsworks provides a complete self-service development environment for feature engineering and model training.
3+
You can develop programs as Jupyter notebooks or jobs, customize the bundled FTI (feature, training and inference pipeline) python environments, you can manage your source code with Git, and you can orchestrate jobs with Airflow.
4+
5+
<img src="../../../assets/images/concepts/dev/dev-inside.svg" alt="Hopsworks Development Environment" />
46

57
### Jupyter Notebooks
68

7-
Hopsworks provides a Jupyter notebook development environment for programs written in Python, Spark, Flink, and SparkSQL. You can also develop in your IDE (PyCharm, IntelliJ, etc), test locally, and then run your programs as Jobs in Hopsworks. Jupyter notebooks can also be run as Jobs.
9+
Hopsworks provides a Jupyter notebook development environment for programs written in Python, Spark, Flink, and SparkSQL.
10+
You can also develop in your IDE (PyCharm, IntelliJ, etc), test locally, and then run your programs as Jobs in Hopsworks.
11+
Jupyter notebooks can also be run as Jobs.
812

913
### Source Code Control
1014

11-
Hopsworks provides source code control support using Git (GitHub, GitLab or BitBucket). You can securely checkout code into your project and commit and push updates to your code to your source code repository.
15+
Hopsworks provides source code control support using Git (GitHub, GitLab or BitBucket).
16+
You can securely check out code into your project and commit and push updates to your code to your source code repository.
1217

1318
### FTI Pipeline Environments
1419

15-
Hopsworks postulates that building ML systems following the FTI pipeline architecture is best practice. This architecture consists of three independently developed and operated ML pipelines:
20+
Hopsworks postulates that building ML systems following the FTI pipeline architecture is best practice.
21+
This architecture consists of three independently developed and operated ML pipelines:
1622

17-
* Feature pipeline: takes as input raw data that it transforms into features (and labels)
18-
* Training pipeline: takes as input features (and labels) and outputs a trained model
19-
* Inference pipeline: takes new feature data and a trained model and makes predictions
23+
- Feature pipeline: takes as input raw data that it transforms into features (and labels)
24+
- Training pipeline: takes as input features (and labels) and outputs a trained model
25+
- Inference pipeline: takes new feature data and a trained model and makes predictions
2026

21-
In order to facilitate the development of these pipelines Hopsworks bundles several python environments containing necessary dependencies. Each of these environments may then also be customized further by cloning it and installing additional dependencies from PyPi, Conda channels, Wheel files, GitHub repos or a custom Dockerfile. Internal compute such as Jobs and Jupyter is run in one of these environments and changes are applied transparently when you install new libraries using our APIs. That is, there is no need to write a Dockerfile, users install libraries directly in one or more of the environments. You can setup custom development and production environments by creating separate projects or creating multiple clones of an environment within the same project.
27+
In order to facilitate the development of these pipelines Hopsworks bundles several python environments containing necessary dependencies.
28+
Each of these environments may then also be customized further by cloning it and installing additional dependencies from PyPi, Conda channels, Wheel files, GitHub repos or a custom Dockerfile.
29+
Internal compute such as Jobs and Jupyter is run in one of these environments and changes are applied transparently when you install new libraries using our APIs.
30+
That is, there is no need to write a Dockerfile, users install libraries directly in one or more of the environments.
31+
You can setup custom development and production environments by creating separate projects or creating multiple clones of an environment within the same project.
2232

2333
### Jobs
2434

25-
In Hopsworks, a Job is a schedulable program that is allocated compute and memory resources. You can run a Job in Hopsworks:
35+
In Hopsworks, a Job is a schedulable program that is allocated compute and memory resources.
36+
You can run a Job in Hopsworks:
2637

27-
* From the UI
28-
* Programmatically with the Hopsworks SDK (Python, Java) or REST API
29-
* From Airflow programs (either inside our outside Hopsworks)
30-
* From your IDE using a plugin ([PyCharm/IntelliJ plugin](https://plugins.jetbrains.com/plugin/15537-hopsworks))
38+
- From the UI
39+
- Programmatically with the Hopsworks SDK (Python, Java) or REST API
40+
- From Airflow programs (either inside our outside Hopsworks)
41+
- From your IDE using a plugin ([PyCharm/IntelliJ plugin](https://plugins.jetbrains.com/plugin/15537-hopsworks))
3142

3243
### Orchestration
3344

34-
Airflow comes out-of-the box with Hopsworks, but you can also use an external Airflow cluster (with the Hopsworks Job operator) if you have one. Airflow can be used to schedule the execution of Jobs, individually or as part of Airflow DAGs.
45+
Airflow comes out-of-the box with Hopsworks, but you can also use an external Airflow cluster (with the Hopsworks Job operator) if you have one.
46+
Airflow can be used to schedule the execution of Jobs, individually or as part of Airflow DAGs.

docs/concepts/dev/outside.md

Lines changed: 5 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,8 @@
1-
You can write programs that use Hopsworks in any [Python, Spark, PySpark, or Flink environment](../../user_guides/integrations/index.md). Hopsworks also running SQL queries to compute features in external data warehouses. The Feature Store can also be queried with SQL.
1+
You can write programs that use Hopsworks in any [Python, Spark, PySpark, or Flink environment](../../user_guides/integrations/index.md).
2+
Hopsworks also running SQL queries to compute features in external data warehouses.
3+
The Feature Store can also be queried with SQL.
24

3-
There is REST API for Hopsworks that can be used with a valid API key, generated in Hopsworks. However, it is often easier to develop your programs against SDKs available in Python and Java/Scala for HSFS, in Python for HSML, and in Python for the Hopsworks API.
5+
There is REST API for Hopsworks that can be used with a valid API key, generated in Hopsworks.
6+
However, it is often easier to develop your programs against SDKs available in Python and Java/Scala for HSFS, in Python for HSML, and in Python for the Hopsworks API.
47

58
<img src="../../../assets/images/concepts/dev/dev-outside.svg">

0 commit comments

Comments
 (0)