Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
99 changes: 11 additions & 88 deletions docs/developer-guide.md
Original file line number Diff line number Diff line change
Expand Up @@ -116,6 +116,12 @@ work. Starting from the root of the Git repository:
./scripts/build/buildForTest.sh
```

You can disable building the Rust code by passing `-DskipRust` as an argument to that script. This can also be passed in
any Maven build. This can speed up the build if you don't need the DataFusion data engine, or if you've already had a
previous build that included Rust, skipping Rust will reuse the same binaries.

When running Maven directly, you can pass `-Pquick` to skip tests and linting.

### Sleeper CLI

To build the Sleeper CLI, you can run this script:
Expand All @@ -132,64 +138,11 @@ If you have the CLI installed already it will be replaced with the version that
is different in the version you installed before, it will not be replaced. You can find it
at `$HOME/.local/bin/sleeper`, and manually overwrite it with the contents of `./scripts/cli/runInDocker.sh`.

### Java

To build the Java code only, without installing it for the scripts:

```bash
cd java
mvn clean install -Pquick
```

Removing the '-Pquick' option will cause the unit and integration tests to run.

### Disabling Rust component

You can disable the building of the Rust modules with:

```bash
cd java
mvn clean install -Pquick -DskipRust=true
```

### Publishing Maven artifacts

There is a script [`scripts/dev/publishMaven.sh`](/scripts/dev/publishMaven.sh) to publish the Maven artifacts,
including all modules and the fat jars used to deploy from and run scripts.

There is also a script [`scripts/dev/publishFatJars.sh`](/scripts/dev/publishFatJars.sh) to publish just the fat jars
used for deployment and running scripts.

At some point one of these two will likely be removed.

The publishFatJars.sh version takes in two arguments:
- The repository url to publish.
- The ID of a server in a local m2 settings file which should contain authentication details.
### Publishing artefacts

This can be tested locally by using a repository url similar to file:/path/to/output that will publish these files to the local file system.

The publishMaven.sh version accepts options to pass through to Maven, including `-DaltDeploymentRepository`, documented
against [the Maven plugin](https://maven.apache.org/plugins/maven-deploy-plugin/deploy-mojo.html). If you don't set a
deployment repository it will publish the files to the local file system at `/tmp/sleeper/m2`.

To setup the local m2 settings file this guide can be followed: [Link to Baeldung](https://www.baeldung.com/maven-settings-xml#5-servers)

The development team are adding a way to retrieve and publish jars to AWS. Right now we only support deploying to AWS
from jars that were built locally, but in the future you will be able to deploy jars from a Maven repository as well.

### Publishing Docker images

There is a script [`scripts/dev/publishDocker.sh`](/scripts/dev/publishDocker.sh) to publish the Docker images to a
repository.

It takes in two arguments:
* The repository prefix path.
* An optional boolean to create the images that should be built for multiple platforms, this defaults to true.
See [StackDockerImage.java](/java/clients/src/main/java/sleeper/clients/deploy/container/StackDockerImage.java) for more details.

The development team are adding a way to retrieve and publish Docker images to AWS. Right now we only support uploading
the images to AWS if they were built locally, but in the future you will be able to upload images from an external
repository as well.
Tools are available to publish built artefacts to shared repositories, and to install them locally to avoid the need to
build Sleeper yourself. We do not currently publish artefacts publicly.
See [publishing artefacts](development/publishing.md) for how to set this up yourself.

## Using the codebase

Expand Down Expand Up @@ -284,34 +237,4 @@ See the [release process guide](development/release-process.md) for instructions

## Development scripts

In the `/scripts/dev` folder are some scripts that can assist you while working on Sleeper:

#### `showInternalDependencies.sh`

This will display a graph of the dependencies between Sleeper's Maven modules. You can use this to explore how the
modules relate to one another.

#### `generateDocumentation.sh`

This will regenerate the examples and templates for Sleeper configuration properties files. Use this if you've made any
changes to Sleeper configuration properties. This will propagate any changes to property descriptions, ordering,
grouping, etc.

#### `cleanupLogGroups.sh`

When deploying multiple instances (or running multiple system tests), many log groups will be generated. This can make
it difficult to find the logs you need to view. This script will delete any log groups that meet all of the following
criteria:

* Its name does not contain the name of any deployed CloudFormation stack
* Either it's empty, or it has no retention period and is older than 30 days

This can be used to limit the number of log groups in your AWS account, particularly if all your log groups are
deployed by the CDK or CloudFormation, with the stack name in the log group name.

Note that this will not delete log groups for recently deleted instances of Sleeper, so you will still need a different
instance ID when deploying a new instance to avoid naming collisions with existing log groups.

#### `updateVersionNumber.sh`

This is used during the release process to update the version number across the project (see below).
See [development scripts](development/dev-scripts.md) for scripts that can assist you while working on Sleeper.
96 changes: 96 additions & 0 deletions docs/development/dev-scripts.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,96 @@
## Development scripts

In the `/scripts/dev` folder are some scripts that can assist you while working on Sleeper:

#### `buildDockerImage.sh`

This will build a single Docker image, and can be used to prepare for execution of the local Docker image tests, e.g.
`QueryLambdaDockerImageST`. You can run it like this:

```bash
./scripts/dev/buildDockerImage.sh query-lambda test
```

The first parameter is the name of the image, as listed in the documentation
of [Docker images](../deployment/images-to-upload.md). The second parameter is the tag, usually "test" for an automated
Docker image test.

#### `checkNotices.sh`

This will check whether all managed Maven dependencies have been included in the NOTICES file at the root of the
repository.

#### `checkRustStyle.sh`

This runs linting on the Rust code.

#### `checkSpotBugs.sh`

This runs SpotBugs on specified Java modules. This is separated from the other linting as SpotBugs is quite slow.
Here's an example of how to run it on multiple modules (note no spaces between the modules):

```bash
./scripts/dev/checkSpotBugs.sh core,ingest/ingest-core
```

#### `checkStyle.sh`

This runs linting on the Java code, except for SpotBugs as that is quite slow.

#### `cleanupLogGroups.sh`

When deploying multiple instances (or running multiple system tests), many log groups will be generated. This can make
it difficult to find the logs you need to view. This script will delete any log groups that meet all of the following
criteria:

* Its name does not contain the name of any deployed CloudFormation stack
* Either it's empty, or it has no retention period and is older than 30 days

This can be used to limit the number of log groups in your AWS account, particularly if all your log groups are
deployed by the CDK or CloudFormation, with the stack name in the log group name.

Note that this will not delete log groups for recently deleted instances of Sleeper, so you will still need a different
instance ID when deploying a new instance to avoid naming collisions with existing log groups.

#### `copyRustToJava.sh`

This builds the Rust code and copies the binaries into the Maven project, so that the Rust code should be available
when you run from inside your IDE. This is intended for use when running integration tests that call from Java into
Rust.

#### `generateDocumentation.sh`

This will regenerate the examples and templates for Sleeper configuration properties files. Use this if you've made any
changes to Sleeper configuration properties. This will propagate any changes to property descriptions, ordering,
grouping, etc.

#### `publishDocker.sh`

Publishes Docker images to a remote repository, see [publishing artefacts](publishing.md).

#### `publishMaven.sh`

Publishes Maven artifacts to a remote repository, see [publishing artefacts](publishing.md).

#### `showInternalDependencies.sh`

This will display a graph of the dependencies between Sleeper's Maven modules. You can use this to explore how the
modules relate to one another.

#### `updateVersionNumber.sh`

This is used during the release process to update the version number across the project, see
the [release process guide](release-process.md).

#### `validateProjectChunks.sh`

Checks the configuration of the build for GitHub Actions. This compares `.github/config/chunks.yaml` against the Maven
project and the GitHub Actions workflows under `.github/workflows/chunk-<id>.yaml`.

This is a split build where different Maven modules are built in different GitHub Actions workflows. Each workflow
builds a single "chunk", with a number of modules. The `chunks.yaml` file defines the chunks, and therefore which
modules should be built together in the same workflow.

This script checks that all Maven modules are included in the build. There are also build triggers that need to be hard
coded in the GitHub Actions workflow. The script checks that triggers are set to run each workflow on any changes to
files under its modules, or modules that they depend on.
80 changes: 80 additions & 0 deletions docs/development/publishing.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,80 @@
Publishing artefacts
====================

We have scripts to publish built artefacts to shared repositories, and to install them locally to avoid the need to
build Sleeper yourself. We do not currently publish artefacts publicly.

### Publishing Maven artifacts

There is a script [`scripts/dev/publishMaven.sh`](/scripts/dev/publishMaven.sh) to publish the Maven artifacts,
including all modules and the fat jars used to deploy from and run scripts.

This accepts options to pass through to Maven, including `-DaltDeploymentRepository`, documented
against [the Maven plugin](https://maven.apache.org/plugins/maven-deploy-plugin/deploy-mojo.html). If you don't set a
deployment repository it will publish the files to the local file system at `/tmp/sleeper/m2`.

Here's an example of running this script:

```bash
./scripts/dev/publishMaven.sh -DaltDeploymentRepository=my-repo-id::https://my.repository.com/path
```

Your Maven settings file will need to have this repository declared, with a matching ID and any necessary
authentication. Here's a guide to set this up: https://www.baeldung.com/maven-settings-xml#5-servers

This can be tested locally by using a repository url similar to file:/path/to/output that will publish these files to
the local file system.

### Publishing Docker images

There is a script [`scripts/dev/publishDocker.sh`](/scripts/dev/publishDocker.sh) to publish the Docker images to a
repository. It can be used like this:

```bash
./scripts/dev/publishDocker.sh my.registry.com/path
```

The first argument is the prefix that will begin each Docker image name. It should include the hostname and any path
that you want to be used before the path component for each image. In this example images will be pushed like
`my.registry.com/path/ingest`, `my.registry.com/path/query-lambda`.

You can pass an optional second argument for whether to create a new Docker builder. By default a Docker builder will be
created that is capable of publishing multiplatform images, like this:

```bash
docker buildx create --name sleeper --use
```

This may not be suitable for all use cases. You can disable this by passing "false" as the second argument. In that
case, you will need to ensure a Docker builder is set that can build multiplatform images before calling this script.

### Installing published artefacts

We have scripts to install Sleeper from published artefacts. We have not yet published Sleeper to Maven Central or
Docker Hub. To install your own artefacts published as in the sections above, you can follow these steps:

1. Prepare a clone of this Git repository.
2. Use `scripts/deploy/installJarsFromMaven.sh` to retrieve the jars from Maven.
3. Use `scripts/deploy/setDeployFromRemoteDocker.sh` to configure the Sleeper scripts to pull published Docker images.
4. Use the Sleeper scripts as though you had built from scratch.

The `installJarsFromMaven.sh` script can be used like this:

```bash
./scripts/deploy/installJarsFromMaven.sh <version> ./scripts/jars -DremoteRepositories=my-repo-id::https://my.repository.com/path
```

Your Maven settings file will need to have your repository declared, with a matching ID and any necessary
authentication. Here's a guide to set this up: https://www.baeldung.com/maven-settings-xml#5-servers

The version must be the Maven version as it was in the Sleeper `java/pom.xml` when it was published to the repository.

The `setDeployFromRemoteDocker.sh` script can be used like this:

```bash
./scripts/deploy/setDeployFromRemoteDocker.sh my.registry.com/path
```

The argument to the script must be the prefix you used to publish the images. This script will create a configuration
file under the templates directory that will adjust the way Docker images are pushed to AWS ECR during deployment, to
pull them from your repository instead of building them locally.
Loading