Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
112 commits
Select commit Hold shift + click to select a range
10fa894
Create initial GitHub Actions workflow
ElliottKasoar Jul 7, 2023
8e55790
Add OpenSearch dependency
ElliottKasoar Jun 29, 2023
4db8407
Create initial OpenSearch interface
ElliottKasoar Jul 31, 2023
0503ab9
Add OpenSearch insertion and deletion functions
ElliottKasoar Jul 31, 2023
25d7640
Add OpenSearch property functions
ElliottKasoar Jul 31, 2023
29cfa96
Add openmock dependency
ElliottKasoar Jul 31, 2023
9f78aa5
Add OpenSearch example notebooks
ElliottKasoar Jul 31, 2023
5153eb4
Add functions required for OpenSearch CLI
ElliottKasoar Aug 2, 2023
74986be
Add function to add properties
ElliottKasoar Aug 3, 2023
d03fa68
Add function to rename properties
ElliottKasoar Aug 3, 2023
735b601
Add function to delete properties
ElliottKasoar Aug 3, 2023
2ac00b9
Add luqum query parsing for OpenSearch
ElliottKasoar Aug 8, 2023
1d4aeff
Fix unit tests for OpenSearch
ElliottKasoar Aug 9, 2023
a5231d2
Refactor default OpenSearch query
ElliottKasoar Aug 15, 2023
c6853de
Add function to return data as dictionary
ElliottKasoar Aug 15, 2023
671c8be
Update OpenSearch useage tutorial
ElliottKasoar Aug 15, 2023
426d80b
Add function to delete documents by ID
ElliottKasoar Aug 16, 2023
c121c40
Add function to update documents by ID
ElliottKasoar Aug 16, 2023
ad9a819
Update extra info parsing for Lucene-like inputs
ElliottKasoar Aug 17, 2023
a92ab96
Add docstrings and type hints
ElliottKasoar Aug 18, 2023
43629e5
Tidy code
ElliottKasoar Aug 21, 2023
4c0f097
Fix formatting for non-string value error
ElliottKasoar Aug 24, 2023
0c0561a
Add class to read csv data
ElliottKasoar Aug 24, 2023
29d9a67
Enable spacegroups to be stored from Atoms
ElliottKasoar Aug 24, 2023
d364e62
Move structure files to list and make units optional
ElliottKasoar Aug 24, 2023
e916b6b
Update README for OpenSearch
ElliottKasoar Aug 24, 2023
801f87d
Update assertions for database tests
ElliottKasoar Aug 25, 2023
9009ded
Add tests for csv reader
ElliottKasoar Aug 25, 2023
74fb835
Add flake8 and black
ElliottKasoar Aug 25, 2023
c67d004
Apply black formatting
ElliottKasoar Aug 25, 2023
ec82064
Conform to flake8 style
ElliottKasoar Aug 25, 2023
462325f
Add notebook for extra information examples
ElliottKasoar Aug 30, 2023
c5d3ad5
Enable kwargs for OpenSearch client settings
ElliottKasoar Aug 31, 2023
4520f51
Fix property function if property not present
ElliottKasoar Aug 31, 2023
13bfe76
Run OpenSearch unit tests with GitHub Actions
ElliottKasoar Aug 25, 2023
f5e2902
Add refresh function for OpenSearch
ElliottKasoar Sep 1, 2023
83c262d
Update unit tests
ElliottKasoar Sep 1, 2023
533b096
Fix delete property function
ElliottKasoar Sep 1, 2023
9560d4f
Add tolerance for failed connections in testing
ElliottKasoar Sep 1, 2023
39af132
Update README
ElliottKasoar Sep 1, 2023
d5f7ec1
Update OpenSearch notebook
ElliottKasoar Sep 1, 2023
c86043c
Apply black formatting
ElliottKasoar Sep 4, 2023
8cd3ef1
Improve property file reader
ElliottKasoar Sep 6, 2023
cc6edb4
Add benchmarking examples
ElliottKasoar Sep 6, 2023
8aacf4b
Enable uploading list of extra info
ElliottKasoar Sep 7, 2023
3d11150
Extend benchmarking
ElliottKasoar Sep 7, 2023
dbeed8a
Fix OpenSearch CLI queries
ElliottKasoar Jan 12, 2024
6c30302
Tidy unit tests
ElliottKasoar Jan 22, 2024
94aa1a9
Add option to disabled SSL
ElliottKasoar Jan 22, 2024
3cb7ded
Add initial CLI integration tests
ElliottKasoar Jan 22, 2024
4938a12
Catch process errors in CLI tests
ElliottKasoar Jan 22, 2024
0f966a4
Add chunk and timeout kwargs for bulk push
ElliottKasoar Jan 24, 2024
1e7dfd1
Refactor backend code
ElliottKasoar Jan 24, 2024
3581141
Update minimum openserch for security
ElliottKasoar Mar 11, 2024
e3c7ae1
Update admin default password in CI
ElliottKasoar Mar 11, 2024
4cf3834
Add query test
ElliottKasoar Mar 12, 2024
33f2677
Add OpenSearch refresh CLI command
ElliottKasoar Mar 13, 2024
1a0364f
Replace luqum with native query_string
ElliottKasoar Mar 12, 2024
278cc3f
Speed up property function
ElliottKasoar Apr 23, 2024
30ad175
Apply black formatter
ElliottKasoar Apr 29, 2024
a6f434a
Add timeout to count
ElliottKasoar Apr 29, 2024
d587cc0
Add comment for property function
ElliottKasoar Apr 29, 2024
478a8a4
Fix deleting keys via CLI
ElliottKasoar Apr 29, 2024
f677822
Fix adding keys via CLI
ElliottKasoar Apr 29, 2024
33dc0bc
Fix renaming properties
ElliottKasoar Apr 29, 2024
87cfafe
Add notebook with example queries
ElliottKasoar Apr 29, 2024
808d5dd
Allow multiple properties to be returned
ElliottKasoar Apr 30, 2024
16a1248
Fix CLI code formatting
ElliottKasoar Apr 30, 2024
1cb7ab2
Apply suggestions from code review
ElliottKasoar Jun 11, 2024
b79779d
Tidy for flake8
ElliottKasoar Jun 11, 2024
2668db9
Tidy README formatting
ElliottKasoar Jun 11, 2024
facc917
Tidy optional type hints
ElliottKasoar Jun 11, 2024
40ed1c9
Add return type hint
ElliottKasoar Jun 11, 2024
d8c980c
Apply suggestions from code review
ElliottKasoar Jun 11, 2024
c3b6d9c
Update abcd/backends/atoms_opensearch.py
ElliottKasoar Jun 11, 2024
c6b9f43
Fix type Optional Union type hints
ElliottKasoar Jun 11, 2024
8befe61
Fix connection type
ElliottKasoar Jun 11, 2024
f96cbb4
Apply suggestions from code review
ElliottKasoar Jun 11, 2024
294a2f9
Fix renaming keys
ElliottKasoar Jun 11, 2024
d462dd5
Tidy setting db
ElliottKasoar Jun 11, 2024
70e0244
Tidy code
ElliottKasoar Jun 11, 2024
a67c5a4
Tidy logs
ElliottKasoar Jun 11, 2024
08d116a
Tidy code
ElliottKasoar Jun 11, 2024
98c3830
Fix extra info
ElliottKasoar Jun 12, 2024
d5c243e
Tidy code
ElliottKasoar Jun 12, 2024
b20b454
Update mongomock tests for pytest
ElliottKasoar Jun 12, 2024
718a3bf
Update mock opensearch tests for pytest
ElliottKasoar Jun 12, 2024
de9593c
Update CLI tests for pytest
ElliottKasoar Jun 12, 2024
eaf290e
Update property tests for pytest
ElliottKasoar Jun 12, 2024
ad06e2e
Update opensearch tests for pytest
ElliottKasoar Jun 12, 2024
45ff643
Fix opensearch mock tests
ElliottKasoar Jun 12, 2024
375daa1
Fix opensearch test
ElliottKasoar Jun 12, 2024
efe1625
Tidy code
ElliottKasoar Jun 12, 2024
78270ec
Fix opensearch test
ElliottKasoar Jun 12, 2024
532ebff
Fix mock opensearch tests
ElliottKasoar Jun 12, 2024
b1152ea
Fix histogram query
ElliottKasoar Jun 13, 2024
c82ec9d
Remove black and flake8
ElliottKasoar Mar 27, 2025
7dd21eb
Apply ruff
ElliottKasoar Mar 27, 2025
f442d20
Apply ruff unsafe fixes
ElliottKasoar Mar 27, 2025
f5208b9
Apply changes from ruff
ElliottKasoar Mar 27, 2025
362effa
Simplify CI matrix
ElliottKasoar Mar 27, 2025
d7a62cf
Simplify connections
ElliottKasoar Mar 27, 2025
3cd7034
Update OpenSearch
ElliottKasoar Mar 27, 2025
1c0b348
Fix calculator with no parameters
ElliottKasoar Mar 27, 2025
ad3a152
Tidy tests
ElliottKasoar Mar 27, 2025
2ae02bf
Tidy building script
ElliottKasoar Mar 28, 2025
b080719
Tidy code from review
ElliottKasoar Mar 28, 2025
2bb5eb3
Remove unused flake8 file
ElliottKasoar Mar 28, 2025
90434d7
Formatting
ElliottKasoar Mar 28, 2025
710b87e
Update ruff
ElliottKasoar Mar 28, 2025
2a8d6a0
Raise errors rather than exit
ElliottKasoar Mar 28, 2025
752984e
Update README for OpenSearch password
ElliottKasoar Mar 31, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 7 additions & 0 deletions .ci/opensearch/Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
FROM docker:stable

RUN apk add --update bash

COPY run-opensearch.sh /run-opensearch.sh

ENTRYPOINT ["/run-opensearch.sh"]
33 changes: 33 additions & 0 deletions .ci/opensearch/action.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
name: 'Run OpenSearch'
description: 'This action spins up an Opensearch instance that can be accessed and used in your subsequent steps.'

inputs:
opensearch-version:
description: 'The version of the OpenSearch you want to run'
required: true
security-enabled:
description: 'Enable or disable HTTPS, enabled by default'
default: 'false'
required: false
nodes:
description: 'Number of nodes in the cluster'
required: false
default: 1
port:
description: 'Port where you want to run OpenSearch'
required: false
default: 9200
opensearch-initial-admin-password:
description: 'The password for the user admin in your cluster'
required: false
default: 'myStrongPassword_123'

runs:
using: 'docker'
image: 'Dockerfile'
env:
OPENSEARCH_VERSION: ${{ inputs.opensearch-version }}
NODES: ${{ inputs.nodes }}
PORT: ${{ inputs.port }}
SECURITY_ENABLED: ${{ inputs.security-enabled }}
OPENSEARCH_INITIAL_ADMIN_PASSWORD: ${{ inputs.opensearch-initial-admin-password }}
9 changes: 9 additions & 0 deletions .ci/opensearch/functions/imports.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
#!/usr/bin/env bash
#
# Sets up all the common variables and imports relevant functions
#
# Version 1.0.1
# - Initial version after refactor
# From https://github.com/opensearch-project/opensearch-py/blob/main/.ci/functions/imports.sh

source ./.ci/opensearch/functions/wait-for-container.sh
42 changes: 42 additions & 0 deletions .ci/opensearch/functions/wait-for-container.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,42 @@
#!/usr/bin/env bash
#
# Exposes a routine scripts can call to wait for a container if that container set up a health command
#
# Please source .ci/functions/imports.sh as a whole not just this file
#
# Version 1.0.1
# - Initial version after refactor
# - Make sure wait_for_contiainer is silent
# From https://github.com/opensearch-project/opensearch-py/blob/main/.ci/functions/wait-for-container.sh

function container_running {
if [[ "$(docker ps -q -f name=$1)" ]]; then
return 0;
else return 1;
fi
}

function wait_for_container {
set +x
until ! container_running "$1" || (container_running "$1" && [[ "$(docker inspect -f "{{.State.Health.Status}}" ${1})" != "starting" ]]); do
echo ""
docker inspect -f "{{range .State.Health.Log}}{{.Output}}{{end}}" ${1}
echo -e "\033[34;1mINFO:\033[0m waiting for node $1 to be up\033[0m"
sleep 4;
done;

# Always show logs if the container is running, this is very useful both on CI as well as while developing
if container_running $1; then
docker logs $1
fi

if ! container_running $1 || [[ "$(docker inspect -f "{{.State.Health.Status}}" ${1})" != "healthy" ]]; then
echo -e "\033[31;1mERROR:\033[0m Failed to start $1 in detached mode beyond health checks\033[0m"
echo -e "\033[31;1mERROR:\033[0m dumped the docker log before shutting the node down\033[0m"
return 1
else
echo
echo -e "\033[32;1mSUCCESS:\033[0m Detached and healthy: ${1}\033[0m"
return 0
fi
}
57 changes: 57 additions & 0 deletions .ci/opensearch/run-opensearch.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,57 @@
#!/usr/bin/env bash
source ./.ci/opensearch/functions/imports.sh
set -euxo pipefail

if [[ -z $OPENSEARCH_VERSION ]]; then
echo -e "\033[31;1mERROR:\033[0m Required environment variable [OPENSEARCH_VERSION] not set\033[0m"
exit 1
fi

OPENSEARCH_REQUIRED_VERSION="latest"
# Starting in 2.12.0, security demo configuration script requires an initial admin password
if [ "$OPENSEARCH_VERSION" != "$OPENSEARCH_REQUIRED_VERSION" ]; then
OPENSEARCH_INITIAL_ADMIN_PASSWORD="admin"
fi

for (( node=1; node<=${NODES-1}; node++ ))
do
port=$((PORT + $node - 1))

if [[ "$SECURITY_ENABLED" == "true" ]]; then
healthcmd="curl -vvv -s --insecure -u admin:$OPENSEARCH_INITIAL_ADMIN_PASSWORD --fail https://localhost:$port/_cluster/health || exit 1"
security=($(cat <<-END

END
))
elif [[ "$SECURITY_ENABLED" == "false" ]]; then
healthcmd="curl -vvv -s --fail http://localhost:$port/_cluster/health || exit 1"
security=($(cat <<-END
--env plugins.security.disabled=true
END
))
fi

docker run \
--rm \
--detach \
--name="os${node}" \
--env "cluster.name=docker-opensearch" \
--env "http.port=${port}" \
--env discovery.type=single-node \
--env bootstrap.memory_lock=true \
--env "OPENSEARCH_JAVA_OPTS=-Xms4g -Xmx4g" \
--env OPENSEARCH_INITIAL_ADMIN_PASSWORD=$OPENSEARCH_INITIAL_ADMIN_PASSWORD \
"${security[@]}" \
--publish "${port}:${port}" \
--ulimit nofile=65536:65536 \
--ulimit memlock=-1:-1 \
--health-cmd="$(echo $healthcmd)" \
--health-interval=2s \
--health-retries=20 \
--health-timeout=2s \
opensearchproject/opensearch:${OPENSEARCH_VERSION}

if wait_for_container "os$node"; then
echo -e "\033[32;1mSUCCESS:\033[0m OpenSearch up and running\033[0m"
fi
done
7 changes: 7 additions & 0 deletions .ci/opensearch/test.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
#!/usr/bin/env bash

script_path=$(dirname $(realpath -s $0))
source $script_path/functions/imports.sh
set -euxo pipefail

echo $script_path/functions/imports.sh
23 changes: 22 additions & 1 deletion .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -7,12 +7,29 @@ jobs:
tests:
runs-on: ubuntu-latest
strategy:
fail-fast: false
matrix:
python-version: [ "3.9", "3.10", "3.11", "3.12" ]
python-version: ["3.10", "3.11", "3.12"]
opensearch: ["latest"]
security-enabled: ["true", "false"]

steps:
- uses: actions/checkout@v4

- name: Configure sysctl limits
run: |
sudo swapoff -a
sudo sysctl -w vm.swappiness=1
sudo sysctl -w fs.file-max=262144
sudo sysctl -w vm.max_map_count=262144

- name: Start OpenSearch
uses: ./.ci/opensearch
with:
port: 9250
opensearch-version: ${{ matrix.opensearch }}
security-enabled: ${{ matrix.security-enabled }}

- name: Set up Python ${{ matrix.python-version }}
uses: actions/setup-python@v5
with:
Expand All @@ -29,6 +46,10 @@ jobs:
- name: Run unit tests
run: |
poetry run pytest --cov=abcd --cov-report xml --cov-report term:skip-covered
env:
port: 9250
security_enabled: ${{ matrix.security-enabled }}
opensearch-version: ${{ matrix.opensearch }}

- name: Upload coverage reports to Codecov
uses: codecov/[email protected]
Expand Down
2 changes: 1 addition & 1 deletion .pre-commit-config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ repos:

- repo: https://github.com/astral-sh/ruff-pre-commit
# Ruff version.
rev: v0.9.6
rev: v0.11.2
hooks:
# Run the linter.
- id: ruff
Expand Down
111 changes: 95 additions & 16 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,75 +12,145 @@ Main features:
- Configurations that consist of atom positions, elements, forces, and various metadata are stored as a dictionary by a MongoDB backend.
- There is no predefined schema, any combination of keys are allowed for all configurations.
- Two modes: "discovery" and "download". Both use filter-type queries, but in "discovery" mode, summary statistics of the configurations that pass the filter are reported. In "download" mode, the matching configurations are downloaded and exported to a file.
- The "discovery" mode can be used to learn what keys exist in the set of configurations that have passed the current quiery filter. The user can use this to refine the query.
- The "discovery" mode can be used to learn what keys exist in the set of configurations that have passed the current query filter. The user can use this to refine the query.
- Complex queries on dictionary key-value pairs are allowed, and their logical combinations.

## Installation

### General Setup

creating tables and views
```

```sh
$ pip install git+https://github.com/libAtoms/abcd.git
```

## Setup
Example Docker installation on Ubuntu:

```sh
sudo apt-get update
sudo apt upgrade
sudo apt install docker.io
sudo groupadd docker
sudo usermod -aG docker $USER
newgrp docker # or exit and log in
```

Docker can be tested by running:

```sh
docker run hello-world
```

Example Python setup on Ubuntu (pip must be updated for poetry to be used successfully):

```sh
sudo apt install software-properties-common
sudo add-apt-repository ppa:deadsnakes/ppa
sudo apt install python3.10
sudo apt-get install python3.10-distutils
sudo apt install python3-virtualenv
virtualenv -p /usr/bin/python3.10 venv_10
source venv_10/bin/activate
curl -sS https://bootstrap.pypa.io/get-pip.py | python3.10
```

If you have an already running mongo server, or install your own, they you are ready to go. Alternatively,
Building and installing ABCD dependencies via poetry:

```sh
git clone https://github.com/libAtoms/abcd.git
curl -sSL https://install.python-poetry.org | python3 -
export PATH="/home/ubuntu/.local/bin:$PATH"
cd abcd
poetry install
poetry build
```

### MongoDB

If you have an already running MongoDB server, or install your own, then you are ready to go. Alternatively,

```sh
docker run -d --rm --name abcd-mongodb -v <path-on-your-machine-to-store-database>:/data/db -p 27017:27017 mongo
```

will download and install a docker and run a database in it.

To connect to a mongodb that is already running, use
```

```sh
abcd login mongodb://localhost
```

If you are running `abcd` inside a docker, and want to connect to a mongodb outside that docker use something like this (example is for Mac OS):

```
```sh
abcd login mongodb://docker.for.mac.localhost
```

The above login command will place create an `~/.abcd` file with the following contents:

```
```sh
{"url": "mongodb://localhost"}
```

# Remote access
### OpenSearch
If you have an already running OpenSearch server, or installed your own, then you are ready to go. Alternatively,

You can set up an `abcd` user on your machine where the database is running, and then access it remotely for discovering data. Make sure you have the `~/.abcd` file created for this user, then put this in the `.ssh/authorized_keys` file (substituting your public key for the last part):
```sh
sudo swapoff -a # optional
sudo sysctl -w vm.swappiness=1 # optional
sudo sysctl -w fs.file-max=262144 # optional
sudo sysctl -w vm.max_map_count=262144
docker run -d --name abcd-opensearch -v <path-on-your-machine-to-store-database>:/data/db -p 9200:9200 -e "discovery.type=single-node" -e "OPENSEARCH_INITIAL_ADMIN_PASSWORD=<PASSWORD>" -it opensearchproject/opensearch:latest
```

will download and install an OpenSearch image and run it. The connection can be tested with:

```sh
curl -vvv -s --insecure -u admin:<PASSWORD> --fail https://localhost:9200
```

To connect to an OpenSearch database that is already running, use

```sh
abcd login opensearch://<USER>:<PASSWORD>@localhost
```

## Remote access

You can set up an `abcd` user on your machine where the database is running, and then access it remotely for discovering data. Make sure you have the `~/.abcd` file created for this user, then put this in the `.ssh/authorized_keys` file (substituting your public key for the last part):

```sh
command="/path/to/abcd --remote ${SSH_ORIGINAL_COMMAND}",no-port-forwarding,no-X11-forwarding,no-agent-forwarding,no-pty ssh-rsa <public-key> your@email
```

Then you'll be able to access the database remotely using, e.g.
```

```sh
ssh [email protected] summary
```

# GUI through a browser + visualisation
## GUI through a browser + visualisation

The database has a simple GUI, coupled with a visualiser. Data for now needs to be uploaded on the command line, but query can be done through the browsers. Instructions below (they include running `abcd` from a docker too, but of course you can run it outside the docker as well. )


#### Usage in docker
## Usage in docker
Currently a manual uploaded image is available, that was built on 7/2/2020 by Tamas K. Stenczel.
To access it:
1. pull the image
```
```sh
docker pull stenczelt/projection-abcd:latest
```

2. create a docker network, which enables the containers to communicate with each other and the outside world as well
```
```sh
docker network create --driver bridge abcd-network
```

3. run the mongo (ABCD) and the visualiser as well
```
```sh
docker run -d --rm --name abcd-mongodb-net -v <path-on-your-machine-to-store-database>:/data/db -p 27017:27017 --network abcd-network mongo

docker run -it --rm --name visualiser-dev -p 9999:9999 --network abcd-network stenczelt/projection-abcd
Expand All @@ -91,8 +161,17 @@ To access it:
This will start the visualiser with ABCD integration! Have fun!

After usage, for cleanup:
```

```sh
docker stop visualiser-dev abcd-mongodb-net # stop the containers
docker rm visualiser-dev abcd-mongodb-net # remove them if --rm did not
docker network rm abcd-network # remove the docker network
```

## Testing

Unit tests are automatically run on push and creation of pull requests. Unit testing using mock databases can also be run in the command line using:

```sh
python -m unittest tests
```
Loading