Data Catalogue

A tool for advertising bio-medical projects and their associated datasets

The catalogue enhances visibility and accessibility of biomedical datasets, fostering collaboration and accelerating research. It supports data sharing, compliance, and reproducibility across the research ecosystem.

Developed through lean, user-centered design, the catalogue integrates community-accepted metadata models and supports diverse data types. It continues to grow through contributions from ELIXIR-LU and partner projects.

Instances:

This software is behind the following instances:

Acknowledgement

Initially launched as the Translational Data Catalogue under IMI and H2020 initiatives jointly developed through ELIXIR Luxembourg, IMI-FAIRplus and IMI-eTRIKS collaborations. Over time, its development and support have evolved into a broader service now provided by ELIXIR Luxembourg.

License

The code is available under AGPL-3.0 license.

Local installation

Local installation of development environment and procedure for docker version are described below.

Requirements

Python ≥ 3.10 Solr ≥ 8.2
npm ≥ 7.5.6

For Ubuntu

sudo apt-get install libsasl2-dev libldap2-dev libssl-dev

Background Tasks Rocky Linux 8

sudo dnf install pango cairo gdk-pixbuf2 libffi-devel
sudo dnf install libreoffice-writer
sudo dnf install redis
sudo systemctl enable --now redis

Procedure

Install python requirements with:
```
python -m pip install .
```
The less compiler needs to be installed to generate the css files.
```
sudo npm install less -g
```

Create the setting.py file by copying the template:

cp datacatalog/settings.py.template datacatalog/settings.py

Modify the setting file (in datacatalog folder) according to your local environment. The SECRET_KEY parameter needs to be filled with a random string. For maximum security, generate it using python:
```
import os
os.urandom(24)
```

Install the npm dependencies with:

cd datacatalog/static/vendor
npm ci
npm run build

Create a solr core

$SOLR_INSTALLATION_FOLDER/bin/solr start
$SOLR_INSTALLATION_FOLDER/bin/solr create_core -c datacatalog

Back to the application folder, build the assets:
```
flask assets build
```
Initialize the solr schema:
```
flask indexer init
```
Index the provided studies, projects and datasets. For local development, change JSON_FILE_PATH from 'data/imi_projects'to 'tests/data/imi_projects_test' or use data from dats-elixir-files.
```
flask import entities Dats study
flask import entities Dats project
flask import entities Dats dataset
```

[Optional] Automatically generate sitemap while indexing the datasets:

flask import entities Dats study --sitemap
flask import entities Dats project --sitemap
flask import entities Dats dataset --sitemap

Generate Sitemap:
```
flask generate_sitemaps
```

[Optional] Extend Index for studies, projects and datasets:

flask indexer extend project
flask indexer extend study
flask indexer extend dataset

[Optional] Drop connector entities - removes connector entities from solr:
```
flask indexer drop_connector_entities Daisy dataset
```
[Optional] Customize the About and Help pages to relect your services.
Run the development server:
```
flask run
```

The application should now be available under http://localhost:5000

Testing

To run the unit tests:

pytest --cov .

Note that a different core is used for tests and will have to be created. By default, it should be called datacatalog_test.

Background Tasks (Celery)

The application uses Celery with Redis for background task processing.

Requirements

Redis must be running:

sudo dnf install pango cairo gdk-pixbuf2 libffi-devel
sudo dnf install libreoffice-writer
sudo dnf install redis
sudo systemctl enable --now redis

# Linux
sudo systemctl start redis

# Docker
docker run -d -p 6379:6379 redis

Running the Worker

Start the Celery worker in a separate terminal:

# Development
USE_CELERY=true celery -A celery_worker:celery_app worker --loglevel=info

# With periodic task scheduler (beat)
USE_CELERY=true celery -A celery_worker:celery_app worker --beat --loglevel=info

# Production (with concurrency)
USE_CELERY=true celery -A celery_worker:celery_app worker --loglevel=warning --concurrency=4

For local (non-Docker) async execution, start the web app with the same flag:

USE_CELERY=true flask run

Configuration

Celery is configured via the CELERY dict in settings.py. Key settings:

Setting	Default	Description
`broker_url`	`redis://localhost:6379/0`	Message broker URL
`result_backend`	`redis://localhost:6379/0`	Task result storage
`task_time_limit`	`300`	Hard time limit (seconds)
`task_soft_time_limit`	`240`	Soft time limit (seconds)

Environment variables CELERY_BROKER_URL and CELERY_RESULT_BACKEND can override defaults. Set USE_CELERY=true to enable asynchronous task dispatch; when false, tasks run synchronously.

Docker-compose build

Thanks to docker-compose, it is possible to easily manage all the components (solr and web server) required to run the application.

Requirements for docker-compose build

Docker and git must be installed.

Building

(local) and (web container) indicate context of execution.

First, generate the certificates that will be used to enable HTTPS in reverse proxy. To do so, change directory to docker/nginx/ and execute generate_keys.sh (relies on OpenSSL). If you don't plan to use HTTPS or just want to see demo running, you can skip this (warning - it would cause the HTTPS connection to be unsafe!).
Then, copy datacatalog/settings.py.template to datacatalog/settings.py. Edit the settings.py file to add a random string of characters in SECRET_KEY. For maximum security use:
```
import os
os.urandom(24)
```
in python to generate this key.

Then build and start the docker containers by running:
```
(local) $ docker-compose up --build
```
That will create a container with datacatalog web application, and a container for solr (the data will be persisted between runs).

Then, to create solr cores, execute in another console:

(local) $ docker-compose exec solr solr create_core -c datacatalog
(local) $ docker-compose exec solr solr create_core -c datacatalog_test

Then, to fill solr data:

(local) $ docker-compose exec web /bin/bash
(web container) $ flask indexer init
(web container) $ flask import entities Dats study
(web container) $ flask import entities Dats project
(web container) $ flask import entities Dats dataset

(PRESS CTRL+D or type: "exit" to exit)

The web application should now be available with loaded data via http://localhost and https://localhost with ssl connection (beware that most browsers display a warning or block self-signed certificates)

Note: Redis and Celery worker are optional and enabled with the celery profile:
```
(local) $ USE_CELERY=true docker-compose --profile celery up --build
```
Check worker logs with:
```
(local) $ docker-compose logs -f celery
```

Maintenance of docker-compose

Docker container keeps the application in the state that it has been when it was built. Therefore, if you change any files in the project, in order to see changes in application the container has to be rebuilt:

docker-compose up --build

If you wanted to delete solr data, you need to run (that will remove any persisted data - you must redo solr create_core):

docker-compose down --volumes

Modifying the datasets

The datasets, projects and studies are all defined in the files located in the folder data/imi_projects. Those files can me modified to add, delete and modify those entities. After saving the files, rebuild and restart docker-compose with:

CTLR+D

to stop all the containers

docker-compose up --build

to rebuild and restart the containers

(local) $ docker-compose exec web /bin/bash
(web container) $ flask import entities Dats study 
(web container) $ flask import entities Dats project
(web container) $ flask import entities Dats dataset
 

(PRESS CTRL+D or type: "exit" to exit)

To reindex the entities

Single Docker deployment

In some cases, you might not want Solr and Nginx to run (for example if there are multiple instances of Data Catalog runnning). Then, simply use:

(local) $ docker build . -t "data-catalog"
(local) $ docker run --name data-catalog --entrypoint "gunicorn" -p 5000:5000 -t data-catalog -t 600 -w 2 datacatalog:app --bind 0.0.0.0:5000

Development

Install needed dependencies with:

pip install .[testing]

Configure pre-commit hook for black and flake8:
see https://dev.to/m1yag1/how-to-setup-your-project-with-pre-commit-black-and-flake8-183k

Name		Name	Last commit message	Last commit date
Latest commit History 88 Commits
.github/workflows		.github/workflows
datacatalog		datacatalog
docker/nginx		docker/nginx
plugins		plugins
tests		tests
.coveragerc		.coveragerc
.dockerignore		.dockerignore
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
CONTRIBUTING.md		CONTRIBUTING.md
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
app.py		app.py
celery_worker.py		celery_worker.py
docker-compose.yml		docker-compose.yml
pyproject.toml		pyproject.toml
tox.ini		tox.ini
update_data.sh		update_data.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Data Catalogue

Instances:

Acknowledgement

License

Table of content

Local installation

Requirements

For Ubuntu

Background Tasks Rocky Linux 8

Procedure

Testing

Background Tasks (Celery)

Requirements

Running the Worker

Configuration

Docker-compose build

Requirements for docker-compose build

Building

Maintenance of docker-compose

Modifying the datasets

Single Docker deployment

Development

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Data Catalogue

Instances:

Acknowledgement

License

Table of content

Local installation

Requirements

For Ubuntu

Background Tasks Rocky Linux 8

Procedure

Testing

Background Tasks (Celery)

Requirements

Running the Worker

Configuration

Docker-compose build

Requirements for docker-compose build

Building

Maintenance of docker-compose

Modifying the datasets

Single Docker deployment

Development

About

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages