This is system is responsible for controlling the full benchmark environment for our MPS. It provides a web API to controll the benchmarks of the target system (our MPS node) using a specific benchmark configuration and with the capacity for overriding specific configurations from the target system.
There are a few requirements necessary for running this system:
- docker
- docker-compose
- nvidia-docker*
- Postgres OS Libs**
Nvidia-docker*: Required only if using GPU. And if using the latest version of nvidia-docker, it's also necessary to install nvidia-container-runtime and configure /etc/docker/daemon.json with the nvidia runtime (reference), this is necessary to maintain backward compatbility and compatiblity issues with docker-compose.
Postgres OS Libs **: Only necessary if installing locally for development instead of using the docker image. Eg of library (ubuntu): libpq-dev
It's necessary to login into our gitlab container registry in order to have access to the docker images.
docker login registry.insight-centre.org
Copy the example.env file into a new file called .env and update the following variables:
This should match our private pypi repository's username and password.
Self explanatory, but it's adivisable to actually create and use a PERSONAL_ACCESS_TOKEN
This set to: http://<host>:<port>/api/v1.0/set_result but replacing <host> and <port> with your machine's IP address and the port in which the system will be running (5000 by default).
Eg: http://123.456.789.10:5000/api/v1.0/set_result
If you want the benchmark to run with GPU enabled services, you should add USE_GPU=1 to the .env file.
Otherwise if you don't want GPU enabled services, remove the variable from the file.
Determines how long (seconds) the system should wait after starting up the target system and the benchmark system, respectively. This helps ensuring that all services are fully up and running before the benchmark starts it's tasks.
Copy the example-configs.json file into a new file called configs.json and update it's content to represent what is the default benchmark that will be executed on each run.
The result_webhook variable is currently overrided for each benchmark execution, as such, changing this is not required.
More information on how to write this file should be read on the Benchmark Tools docs.
If you wish to enable benchmarks that run across multiple nodes (eg: part of the Target system running on one machine, and another part in a second machine), then you need to configure these extra nodes.
For that you need to configure the ansible inventory (copy the ansible-files/example-inventory.ini to ansible-files/inventory.ini for an example) with a line for the localhost and any extra node in another line, example:
[bm_nodes]
localhost ansible_connection=local ansible_host=127.0.0.1
jetson ansible_connection=paramiko ansible_host=192.168.122.217
<other_node_label> ansible_connection=paramiko ansible_host=<other_node_address>
Next, you'll need to add in all the a new Yaml file on ansible-files/host_vars with the name that matches the new node label (as described in the inventory file). Eg for ansible-files/host_vars/jetson.yml:
---
target_system_dest_dir: /tmp/benchmark
ansible_python_interpreter: python3.6
ansible_user: <some user>
ansible_ssh_pass: <some password>
PS: For now the only extra node the system accepts is the "jetson". But in the future this will be open to any other possibility.
First pull all the images:
docker-compose pull
Then, start the containers:
If you want GPU enabled, execute the following command:
docker-compose -f docker-compose.yml -f docker-compose-gpu.yml up -dRun $ pipenv shell to create a python virtualenv and load the .env into the environment variables in the shell.
Then run: $ pipenv install to install all packages, or $ pipenv install -d to also install the packages that help during development, eg: ipython.
This runs the installation using pip under the hood, but also handle the cross dependency issues between packages and checks the packages MD5s for security mesure.
To install using pip directly, one needs to use the --extra-index-url when running the pip install command, in order for to be able to use our private Pypi repository.
Load the environment variables from .env file using source load_env.sh.
To install from the requirements.txt file, run the following command:
$ pip install --extra-index-url https://${SIT_PYPI_USER}:${SIT_PYPI_PASS}@sit-pypi.herokuapp.com/simple -r requirements.txt
Have Postgres and redis running: docker-compose up -d db tasks-redis then.
Execute both run_webservice.sh and run_worker.sh.
Otherwise, execute:
docker-compose up -dPS: If using docker, pay attention to the IP address used in the configurations, such that the benchmark system should be able to talk to the target system and to send the results to the WEBHOOK_BASE_URL.
All communication with the platform controller is done through the web API: http://<host>:<port>/api/v1.0, , but replacing <host> and <port> with your machine's IP address and the port in which the system is running (5000 by default).
To run a benchmark just send a HTTP POST to the following api endpoint: /api/v1.0/run_benchmark.
Example of payload:
{
"override_services": {
"namespace-mapper": {
"image": "registry.insight-centre.org/sit/mps/namespace-mapper:some-tag"
},
"forwarder": {
"image": "registry.insight-centre.org/sit/mps/forwarder:other-tag",
"cpus": "2.5"
}
},
"target_system":{
"version": "1.0.0"
}
}In this example, the system would start up the target system (MPS Node project) using the version "1.0.0" as a basis, but replacing the Namespace-Mapper tag with the some-tag. And also replacing the Forwarder docker image tag with other-tag, as well as changing the docker cpus configuration for 2.5.
If target_system is empty it will use the latest version from the master branch of the target system.
If override_services is empty, it will use the specified version as it is.
{
"override_services": {}
}{
"override_services": {
"object-detection": {
"image": "registry.insight-centre.org/sit/mps/content-extraction-service:game-demo",
"mem_limit": "800mb",
"environment": [
"DNN_WEIGHTS_PATH=/content-extractor/content_extraction_service/dnn_model/yolo_coco_v3/yolo.h5"
]
}
}
}{
"override_services": {
"new-service": {
"image": "registry.insight-centre.org/sit/mps/my-new-service:some-tag",
"mem_limit": "800mb",
"environment": [
"PYTHONUNBUFFERED=0",
"SERVICE_STREAM_KEY=whatever-data",
"SERVICE_CMD_KEY=whatever-cmd",
"LOGGING_LEVEL=DEBUG"
]
}
},
"target_system":{
"version": "v1.1.0"
}
}It is possible to specify a different Gnosis git repository to be used for the benchmark (i.e: a fork of the MPS node project). It is important to remember that all the necessary Gitlab permissions must be configured for the CI user (GITLAB_USER and GITLAB_PASS configs) in this fork of the project as well. To override the Target System git repository, just provide the ['target_system']['git_repository'] in the payload, the git repository address (http address, without the http:// part) has to be set as the following: gitlab.insight-centre.org/SIT/mps/mps-node.git.
{
"target_system": {
"git_repository": "gitlab.insight-centre.org/SIT/mps/mps-node.git"
}
}{
"override_services": {
},
"target_system":{
},
"benchmark": {
//... benchmark tools valid config json goes in here
}
}It is possible to specify a given git tag/branch/commit hash for the benchmark-tools project, which will make sure that the benchmark is executed with that specific version. However, the specified version needs to be configured to use a different docker image.
{
"override_services": {
},
"target_system":{
},
"benchmark": {
"benchmark-version": "some-git-hash/tag",
//... rest of the benchmark tools valid config json goes in here
}
}This datasets need to be available in the ./datasets directory, and the DATASETS_PATH_ON_HOST env var needs to be configured to the absolute path to this directory in the HOST machine.
{
"override_services": {
},
"target_system":{
},
"datasets": [
"coco2017-val-300x300-30fps.flv",
"coco2017-val-300x300-60fps.flv"
]
}Any dataset listed in there, will be made available as a VOD (Video On Demand) at the media-server on the url rtmp://<machine_ip>/vod2/<dataset-name>. Eg: rtmp://172.17.0.1/vod2/coco2017-val-300x300-30fps.flv.
This configuration only makes the dataset videos available, it doesn't configure a publisher for this. To do that one needs to use a custom benchmark configuration with the appropriate data to use this dataset (eg: Registering a publisher that uses the VOD url for that dataset).
To execute the benchmark on an extra node that has been previously configured in the BPC (eg: Jetson), one needs only to update the payload with the the extra_nodes key. In this extra_nodes dictionary the user must pass in which extra node to use (eg: jetson) and what are the configurations for this node. This follows the same pattern as the configurations used for the main node, with the exception of an extra option start_only_services that can be used to specify the list (space separated) of services that will be started, otherwise all services from that extra node will be started. It is also important to properly configure the extra node so that it will connect to the main redis and jaeger node.
The example bellow represents a benchmark that starts the main node without running the object-detection-ssd, and instead it will run the object-detection-ssd in the jetson node with it connected to the main node's redis and jaeger (the IP address are just an example, and may not represent the final ones used in production):
{
"target_system":{
"version": "v2.1.0"
},
"override_services": {
"object-detection-ssd": {
"command": "echo ok"
}
},
"extra_nodes":{
"jetson": {
"override_services": {
"object-detection-ssd": {
"environment": [
"REDIS_ADDRESS=192.168.100.115",
"TRACER_REPORTING_HOST=192.168.100.115",
]
}
},
"target_system": {
"version": "v2.1.0"
},
"start_only_services": "object-detection-ssd"
}
}
}Sending this request should give back two possible responses:
{"result_id": "<result_id>"}or
{"wait": "<wait-time>"}If a wait is present, then it means that the platform is currently busy right now, already executing another benchmark, and the value of this variable is how long one should wait to ask again (just to avoid overflowing the system with multiple requests at a time)
If a result_id is present, this means that the benchmark has started, and the value of this variable is the ID for this execution result. This ID will be necessary in order to get the output of the execution.
With the result_id in hands one can make a GET request to the API endpoint: /api/v1.0/get_result/<result_id>, replacing <result_id> with the actual result_id that one wishes to get information about.
The response will be a json containing:
{
"status": "RUNNING|FINISHED|CLEANUP",
"result": ...
}Where the status indicate if the benchmark is in one of the phases: RUNNING , CLEANUP (finished, but still needs to clear up environment) or FINISHED.
The result will contain the Benchmark Tools result for this execution, but this variable will be empty while the benchmark is not completed yet (eg: if status is RUNNING).
You probably don't want to use this endpoint, since only the Benchmark system should be the one to give the result of a benchmark, but for the sake of clarification this will be documented as well.
To set a result for a benchmark execution, one just need to make a HTTP POST request into this API endpoint: /api/v1.0/set_result/<result_id>, replacing <result_id> with the result id of the benchmark execution you intent to set a benchmark result for.
After the result is set, the Benchmark Platform Controller will start the process of cleaning up the execution environment, and only after this task is done is that the system will be clear to perform another benchmark.
This endpoint is also usefull for making the Benchmark Platform Controller unstuck, if the latest execution had some problem and can't finish up by itself
{
"some": "result"
}The payload is a json, and it's content represent the benchmark result.
Example of invoking the endpoint on the Benchmark Server using wget:
wget -O- --post-data='{"some": "results"}' --header='Content-Type:application/json' 'http://10.2.16.176:5000/api/v1.0/set_result/749320ef-a52d-4131-8c62-aa09497eb904'PS: Again, you don't want to use this endpoint, only the benchmark system (Benchmark Tools) should use this endpoint to push the results back to the platform controller.
The test_benchmark.py script (in the root directory of this project) is used by the Gitlab CI to run the benchmark stage, but one can download this file and run it (Python 3.6+) in order to send benchmarks to a Benchmark Platform Controller. To do so, just execute:
python test_benchmark.py http://<host>:<port> <service_name> <docker_image> <tag>Where <host> and <port> is the host and port where the Benchmark Platform Controller is running, <service_name> is the service name (as defined in the MPS node docker-compose.yml file), <docker_image> and <tag> is the docker image and tag that will replace the service's default one during the benchmark execution.