Skip to content

Commit abbbd27

Browse files
authored
refactor: update doc and contributing guide with module dependencies (#268)
1 parent e5e965b commit abbbd27

File tree

483 files changed

+2501
-1892
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

483 files changed

+2501
-1892
lines changed

CONTRIBUTING.md

Lines changed: 100 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -1,10 +1,14 @@
11
# Contributing to Lance Namespace
22

33
The Lance Namespace codebase is at [lance-format/lance-namespace](https://github.com/lance-format/lance-namespace).
4-
This codebase contains code of the Lance Namespace specification
5-
as well as generated clients and servers using OpenAPI generator.
4+
This codebase contains:
65

7-
This project should only be used to make spec changes to Lance Namespace,
6+
- The Lance Namespace specification
7+
- The core `LanceNamespace` interface and generic connect functionality for all languages except Rust
8+
(for Rust, these are located in the [lance-format/lance](https://github.com/lance-format/lance) repo)
9+
- Generated clients and servers using OpenAPI generator
10+
11+
This project should only be used to make spec and interface changes to Lance Namespace,
812
or to add new clients and servers to be generated based on community demand.
913
In general, we welcome more generated components to be added as long as
1014
the contributor is willing to set up all the automations for generation and publication.
@@ -15,17 +19,94 @@ For contributing changes to implementations other than the directory and REST na
1519
or for adding new namespace implementations,
1620
please go to the [lance-namespace-impls](https://github.com/lance-format/lance-namespace-impls) repo.
1721

22+
## Project Dependency
23+
24+
This project contains the core Lance Namespace specification, interface and generated modules across all languages.
25+
The dependency structure varies by language due to different build and distribution models.
26+
27+
### Rust
28+
29+
For Rust, the interface module `lance-namespace` and implementations (`lance-namespace-impls` for REST and directory namespaces)
30+
are located in the core [lance-format/lance](https://github.com/lance-format/lance) repository.
31+
This is because Rust uses source code builds, and separating modules across repositories makes dependency management complicated.
32+
33+
The dependency chain is: `lance-namespace``lance``lance-namespace-impls`
34+
35+
### Other Languages (e.g. Python, Java)
36+
37+
For Python, Java, and other languages, the core `LanceNamespace` interface and generic connect functionality
38+
are maintained in **this repository** (e.g., `lance-namespace` for Python, `lance-namespace-core` for Java).
39+
The core [lance-format/lance](https://github.com/lance-format/lance) repository then imports these modules.
40+
41+
The reason for this import direction is that `lance-namespace-impls` (REST and directory namespace implementations)
42+
are used in the Lance Python and Java bindings, and are exposed back through the corresponding language interfaces.
43+
These language interfaces can also be imported dynamically without the need to have a dependency of the Lance core library bindings in those languages.
44+
45+
### Other Implementations
46+
47+
For namespace implementations other than directory and REST namespaces,
48+
those are stored in the [lance-format/lance-namespace-impls](https://github.com/lance-format/lance-namespace-impls) repository,
49+
with one implementation per language.
50+
51+
### Dependency Diagram
52+
53+
```mermaid
54+
flowchart TB
55+
subgraph this_repo["lance-namespace repo"]
56+
spec["Spec & Generated Clients"]
57+
py_core["Python: lance-namespace"]
58+
java_core["Java: lance-namespace-core"]
59+
end
60+
61+
subgraph lance_repo["lance repo"]
62+
subgraph rust_modules["Rust Modules"]
63+
rs_ns["lance-namespace"]
64+
rs_lance["lance"]
65+
rs_impls["lance-namespace-impls<br/>(dir, rest)"]
66+
end
67+
py_lance["Python: lance"]
68+
java_lance["Java: lance"]
69+
end
70+
71+
subgraph impls_repo["namespace-impls repo"]
72+
polaris["Apache Polaris"] ~~~ hive["Apache Hive"] ~~~ iceberg_rest["Apache Iceberg REST"] ~~~ unity["Unity Catalog"] ~~~ glue["AWS Glue"]
73+
end
74+
75+
%% Rust dependencies (source build)
76+
rs_ns --> rs_lance
77+
rs_lance --> rs_impls
78+
79+
%% Python/Java dependencies
80+
py_core --> py_lance
81+
java_core --> java_lance
82+
rs_impls -.-> py_lance
83+
rs_impls -.-> java_lance
84+
85+
%% Other implementations depend on core interfaces and lance bindings
86+
py_core -.-> impls_repo
87+
java_core -.-> impls_repo
88+
py_lance -.-> impls_repo
89+
java_lance -.-> impls_repo
90+
91+
style this_repo fill:#1565c0,color:#fff
92+
style lance_repo fill:#e65100,color:#fff
93+
style impls_repo fill:#7b1fa2,color:#fff
94+
style rust_modules fill:#ff8a65,color:#000
95+
```
96+
1897
## Repository structure
1998

2099
This repository currently contains the following components:
21100

22-
| Component | Language | Path | Description |
23-
|----------------------|----------|----------------------------------------|------------------------------------------------------------|
24-
| spec | | docs/src/spec | Lance Namespace Specification |
25-
| Rust Reqwest Client | Rust | rust/lance-namespace-reqwest-client | Generated Rust reqwest client for Lance REST Namespace |
26-
| Python UrlLib3 Client| Python | python/lance_namespace_urllib3_client | Generated Python urllib3 client for Lance REST Namespace |
27-
| Java Apache Client | Java | java/lance-namespace-apache-client | Generated Java Apache HTTP client for Lance REST Namespace |
28-
| Java Springboot Server| Java | java/lance-namespace-springboot-server | Generated Java SpringBoot server for Lance REST Namespace |
101+
| Component | Language | Path | Description |
102+
|-----------------------|----------|----------------------------------------|------------------------------------------------------------|
103+
| Spec | | docs/src | Lance Namespace Specification |
104+
| Python Core | Python | python/lance_namespace | Core LanceNamespace interface and connect functionality |
105+
| Python UrlLib3 Client | Python | python/lance_namespace_urllib3_client | Generated Python urllib3 client for Lance REST Namespace |
106+
| Java Core | Java | java/lance-namespace-core | Core LanceNamespace interface and connect functionality |
107+
| Java Apache Client | Java | java/lance-namespace-apache-client | Generated Java Apache HTTP client for Lance REST Namespace |
108+
| Java SpringBoot Server| Java | java/lance-namespace-springboot-server | Generated Java SpringBoot server for Lance REST Namespace |
109+
| Rust Reqwest Client | Rust | rust/lance-namespace-reqwest-client | Generated Rust reqwest client for Lance REST Namespace |
29110

30111

31112
## Install uv
@@ -74,15 +155,20 @@ Start the server with:
74155
make serve-docs
75156
```
76157

77-
### Generated Doc from OpenAPI Spec
158+
### Generated Model Documentation
78159

79-
The OpenAPI spec at `docs/src/rest.yaml` is digested and generated as Markdown documents for better readability.
80-
Generate the latest documents with:
160+
The operation request and response model documentation is generated from the Java Apache Client.
161+
When building or serving docs, the Java client must be generated first to produce the model Markdown files,
162+
which are then copied to `docs/src/operations/models/`.
163+
164+
This happens automatically when running:
81165

82166
```shell
83-
make gen-docs
167+
make build-docs # or make serve-docs
84168
```
85169

170+
These commands depend on `gen-java` to ensure the Java client docs are up-to-date before building the documentation.
171+
86172
### Understanding the Build Process
87173

88174
The contents in `lance-namespace/docs` are for the ease of contributors to edit and preview.

Makefile

Lines changed: 3 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -50,16 +50,12 @@ gen-java:
5050
build-java:
5151
cd java; make build
5252

53-
.PHONY: sync gen-docs
54-
gen-docs:
55-
cd docs; make gen
56-
5753
.PHONY: build-docs
58-
build-docs:
54+
build-docs: gen-java
5955
cd docs; make build
6056

6157
.PHONY: serve-docs
62-
serve-docs:
58+
serve-docs: gen-java
6359
cd docs; make serve
6460

6561
.PHONY: sync
@@ -70,7 +66,7 @@ sync:
7066
clean: clean-rust clean-python clean-java
7167

7268
.PHONY: gen
73-
gen: lint gen-docs gen-rust gen-python gen-java
69+
gen: lint gen-rust gen-python gen-java
7470

7571
.PHONY: build
7672
build: lint build-docs build-rust build-python build-java

README.md

Lines changed: 17 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -1,10 +1,22 @@
11
# Lance Namespace
22

3-
**Lance Namespace** is an open specification on top of the storage-based Lance data format
4-
to standardize access to a collection of Lance tables (a.k.a. Lance datasets).
5-
It describes how a metadata service like Apache Hive MetaStore (HMS),
6-
Apache Gravitino, Unity Catalog, etc. should store and use Lance tables,
7-
as well as how ML/AI tools and analytics compute engines should integrate with Lance tables.
3+
**Lance Namespace** is an open specification for describing access and operations against a collection of tables in a multimodal lakehouse.
4+
The spec provides a unified model for table-related objects, their relationships within a hierarchy,
5+
and the operations available on these objects — enabling integration with metadata services and compute engines alike.
6+
7+
The Lance Namespace spec consists of three main parts:
8+
9+
1. **Client-Side Standardized Access Spec**: A consistent abstraction that adapts to various catalog specifications
10+
(e.g. Apache Gravitino, Apache Polaris, Unity Catalog, Apache Hive Metastore, Apache Iceberg REST Catalog),
11+
allowing users to choose any catalog to store and use tables.
12+
13+
2. **Directory Namespace Spec**: A natively maintained storage-only catalog spec that is compliant with the
14+
Lance Namespace client-side access spec. It requires no external metadata service — tables are organized directly
15+
on storage (local filesystem, S3, GCS, etc.) with metadata stored alongside the data.
16+
17+
3. **REST Namespace Spec**: A natively maintained REST-based catalog spec that is compliant with the Lance
18+
Namespace client-side access spec. It is suitable for teams that want to develop their own custom handling,
19+
ideal for adoption by data infrastructure teams in enterprise environments with high customization requirements.
820

921
For more details, please visit the [documentation website](https://lance.org/format/namespace).
1022

docs/Makefile

Lines changed: 29 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -10,14 +10,38 @@
1010
# See the License for the specific language governing permissions and
1111
# limitations under the License.
1212

13-
.PHONY: gen
14-
gen:
15-
cd src; uv run update_line_numbers.py
13+
# Java model docs source and destination
14+
JAVA_DOCS_SRC := ../java/lance-namespace-apache-client/docs
15+
MODELS_DEST := src/operations/models
16+
17+
# API files to exclude (Java-specific, not data models)
18+
API_FILES := DataApi.md IndexApi.md MetadataApi.md NamespaceApi.md TableApi.md TagApi.md TransactionApi.md
19+
20+
.PHONY: gen-models
21+
gen-models:
22+
@echo "Copying model docs from Java generated docs..."
23+
@rm -rf $(MODELS_DEST)
24+
@mkdir -p $(MODELS_DEST)
25+
@for f in $(JAVA_DOCS_SRC)/*.md; do \
26+
filename=$$(basename "$$f"); \
27+
skip=false; \
28+
for api in $(API_FILES); do \
29+
if [ "$$filename" = "$$api" ]; then \
30+
skip=true; \
31+
break; \
32+
fi; \
33+
done; \
34+
if [ "$$skip" = "false" ]; then \
35+
cp "$$f" $(MODELS_DEST)/; \
36+
fi; \
37+
done
38+
@echo "title: Models" > $(MODELS_DEST)/.pages
39+
@echo "Model docs copied to $(MODELS_DEST)"
1640

1741
.PHONY: build
18-
build: gen
42+
build: gen-models
1943
uv run mkdocs build
2044

2145
.PHONY: serve
22-
serve: gen
46+
serve: gen-models
2347
uv run mkdocs serve

docs/mkdocs.yml

Lines changed: 8 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -38,7 +38,11 @@ theme:
3838
markdown_extensions:
3939
- admonition
4040
- pymdownx.details
41-
- pymdownx.superfences
41+
- pymdownx.superfences:
42+
custom_fences:
43+
- name: mermaid
44+
class: mermaid
45+
format: !!python/name:pymdownx.superfences.fence_code_format
4246
- pymdownx.highlight:
4347
anchor_linenums: true
4448
line_spans: __span
@@ -64,3 +68,6 @@ extra:
6468
- icon: fontawesome/brands/discord
6569
link: https://discord.gg/lance
6670

71+
extra_javascript:
72+
- https://unpkg.com/mermaid@10/dist/mermaid.min.js
73+

docs/src/.pages

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,5 @@
11
nav:
22
- index.md
33
- operations
4-
- impls
4+
- Directory Namespace: dir
5+
- REST Namespace: rest

docs/src/dir/.pages

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
nav:
2+
- Catalog Spec: catalog-spec.md
3+
- Implementation Spec: impl-spec.md

docs/src/impls/dir.md renamed to docs/src/dir/catalog-spec.md

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
1-
# Lance Directory Namespace
1+
# Lance Directory Namespace Spec
22

33
**Lance directory namespace** is a Lance namespace implementation that stores tables in a directory structure
4-
on any local or remote storage system. It supports two modes:
4+
on any local or remote storage system. It has gone through 2 major spec versions:
55

66
- **V1 (Directory Listing)**: A lightweight, simple 1-level namespace that discovers tables by scanning the directory.
77
- **V2 (Manifest)**: A more advanced implementation backed by a manifest table (a Lance table) that supports nested namespaces and better performance at scale.
@@ -140,17 +140,17 @@ Please visit [Lance ObjectStore Configurations](https://lance.org/guide/object_s
140140

141141
### Compatibility Mode
142142

143-
`manifest_enabled` and `dir_listing_enabled` are used to control using V1 or V2 scheme.
143+
`manifest_enabled` and `dir_listing_enabled` are used to control using V1 or V2 spec.
144144
By default we enable both V1 and V2, this means:
145145

146146
1. When checking if a table exists in root namespace, it first checks if the table exists in the manifest, then checks if the `<table_name>.lance` exists.
147147
2. When listing tables in root namespace, it merges tables from both manifest and directory listing, deduplicating by location and table names, manifest tables taking precedence.
148148
3. When creating tables in root namespaces, it registers them in the manifest and uses V1 `<table_name>.lance` naming for root namespace tables.
149149
4. If a table in root namespace is renamed, it will start to follow the V2 path definition.
150-
5. For operations in child namespaces, only V2 scheme is used.
150+
5. For operations in child namespaces, only V2 spec is used.
151151

152152
### Migration from V1 to V2
153153

154154
A migration should add all the V1 table directory paths to the manifest.
155-
Once the user is certain there is no table following v1 scheme,
155+
Once the user is certain there is no table following v1 spec,
156156
`dir_listing_enabled` can be set to `false` to disable the compatibility mode.

0 commit comments

Comments
 (0)