This repository is a repo to track issues intended for the Planetary Data System (PDS) Engineering Node (EN) Operations Team. These issues may include, but are not limited to, PDS4 NSSDCA Deliveries via PDS Deep Archive, data releases, website updates, or other actions where a corresponding GitHub repository is unknown.
For help with the PDS Engineering Node, you can either create a ticket in GitHub Issues or email [email protected] for more assistance.
This section specifies the requirements needed to run the software in this repository and gives narrative instructions on performing the installation.
Prior to installing this software, ensure your system meets the following requirements:
- Python 3: This software requires Python 3. Python 3.9 is out now, and 3.10 is to be released imminently. Python 2 will absolutely not work, and indeed Python 2 came to its end of life January 2020.
Consult your operating system instructions or system administrator to install the required packages. For those without system administrator access and are feeling anxious, you could try a local (home directory) Python 3 installation using a Miniconda installation.
We will install the operations too using Python Pip, the Python Package Installer. If you have Python on your system, you probably already have Pip; you can run pip --help or pip3 -help to check.
It's best install the tools virtual environment, so it won't interfere with—or be interfered by—other packages. To do so:
$ # Clone the repo or do a git pull if it already exists
$ git clone https://github.com/NASA-PDS/pdsen-operations.git
$ cd pdsen-operations
$ # For Linux, macOS, or other Unix systems:
$ mkdir -p $HOME/.virtualenvs
$ python3 -m venv $HOME/.virtualenvs/pdsen-ops
$ source $HOME/.virtualenvs/pdsen-ops/bin/activate
$ pip3 install --requirement requirements.txtThe pds-stats.py script can be used to get the total download metrics for GitHub software tools. Here is an example of how to get metrics for the Validate, MILabel, and Transform tools.
For usage information run bin/pds-stats.py --help
-
Activate your virtual environment:
source $HOME/.virtualenvs/pdsen-ops/bin/activate -
Execute the script:
bin/pds-stats.py --github_repos validate mi-label transform --token $GITHUB_TOKEN
This utility is used to autonomously generate the data dictionaries web page for each PDS4 Build.
This software determines all the discipline LDDs to be included with this release, auto-generates the web page, and downloads and stages all the discipline LDDs from the LDD Github repos.
The ldd-corral configuration can be modified to add additional discipline LDDs to the workflow.
Format:
<github-repo-name>:
name: a title to be used in the output web page that overrides the <name> from the repo IngestLDD
description: |
description here
For latest usage capabilities:
bin/ldds/ldd-corral.py --help
Base usage example (note: the GITHUB_TOKEN environment variable must be set):
$ source $HOME/.virtualenvs/pdsen-ops/bin/activate
$ ldd-corral.py --pds4_version 1.15.0.0 --token $GITHUB_TOKENDefault outputs:
- Web page:
/tmp/ldd-release/dd-summary.html - Discipline LDDs:
/tmp/ldd-release/pds4
The LDD utility script prep_for_ldd_release.sh is usually run as follows:
- Execute
bin/prep_for_ldd_release.shscript as follows to create new branches in all Discipline LDD repositories:
TBD
-
Go to each Discipline LDD Repo and create Pull Requests for each new branch (branch names like IM_release_1.15.0.0).
- PR Title: PDS4 IM Release <IM_version>
- PR Description:
## Summary- PR Labels:
release - PR for testing LDD with new IM release.
-
If build failed on new branch, contact the LDD Steward to investigate a potential regression test failure or incompatibility with the new IM version.
Download ESA PSA product XML files from search API
options:
-h, --help show this help message and exit
-n NODE_NAME, --node-name NODE_NAME
Name of the node (default psa)
-p DOWNLOAD_PATH, --download-path DOWNLOAD_PATH
Where to create the XML files (default download)
-u URL, --url URL URL of the PDS product search API (default
https://pds.mcp.nasa.gov/api/search/1/products)
-c CONFIG, --config CONFIG
What to call the harvest XML config output (default harvest.cfg)
This script monitors the status of PDS4 packages in NSSDCA and updates GitHub issues accordingly.
- Reads package information from a CSV file
- Checks NSSDCA API for package status
- Updates GitHub issues with status comments
- Sends email notifications for failed packages
- Updates CSV file with new statuses
- Closes issues when all packages are ingested
- Python 3.6 or higher
- GitHub token with repo access
- Email password for [email protected]
- Clone this repository
- Install the required packages:
pip install -r requirements.txt
Set the following environment variables:
GITHUB_TOKEN: Your GitHub personal access tokenEMAIL_PASSWORD: Password for [email protected] email account
The script expects a CSV file named nssdca_status.csv with the following columns:
github_issue_number: The GitHub issue numberidentifier: The package identifier (e.g., urn:nasa:pds:gbo.ast.catalina.survey::1.0)nssdca_status: Current NSSDCA status of the package
Example:
github_issue_number,identifier,nssdca_status
629,urn:nasa:pds:gbo.ast.catalina.survey::1.0,proffered
Run the script:
python nssdca_status_checker.pyThe script will:
- Read the CSV file
- Check NSSDCA status for each package
- Update GitHub issues with comments
- Send email notifications for failed packages
- Update the CSV file with new statuses
- Close issues when all packages are ingested
- Failed API calls are logged
- Email sending errors are logged
- Invalid CSV data is logged
- GitHub API errors are logged
- The script assumes all issues are in the NASA-PDS/operations repository
- Email notifications are sent to [email protected]
- Project board status updates require additional GitHub API configuration
This directory contains operational scripts and tools for managing PDS4 context products.
Location: bin/context/check_duplicate_identifiers.py
A Python script that checks for duplicate logical_identifier values in PDS4 context XML files.
- Recursively scans all XML files in the
data/pds4/context-pds4directory - Extracts
logical_identifiervalues from theIdentification_Areasection - Reports any duplicate identifiers found
- Follows PEP8, linting, and Black formatting standards
- Includes comprehensive error handling and logging
- Returns appropriate exit codes for automation
- Python 3.8 or higher
- Standard library modules only (no external dependencies required)
Run the script from the operations directory:
cd operations
# Check the default directory (data/pds4/context-pds4)
python3 bin/context/check_duplicate_identifiers.py
# Check a specific directory
python3 bin/context/check_duplicate_identifiers.py /path/to/xml/files
# Check with verbose output
python3 bin/context/check_duplicate_identifiers.py --verbose
# Check a specific directory with verbose output
python3 bin/context/check_duplicate_identifiers.py /path/to/xml/files --verboseIf no duplicates are found:
Scanning 1234 XML files in ../../../data/pds4/context-pds4...
✅ No duplicate logical_identifiers found!
If duplicates are found:
Scanning 1234 XML files in /path/to/xml/files...
❌ DUPLICATE LOGICAL_IDENTIFIERS FOUND:
==================================================
Logical Identifier: urn:nasa:pds:context:facility:laboratory.aps
Found in 2 files:
- /path/to/xml/files/facility/laboratory.aps_1.0.xml
- /path/to/xml/files/facility/laboratory.aps_1.1.xml
Total duplicate identifiers: 1
With verbose output:
Scanning 1234 XML files in /path/to/xml/files...
Found: urn:nasa:pds:context:facility:laboratory.aps in /path/to/xml/files/facility/laboratory.aps_1.0.xml
Found: urn:nasa:pds:context:facility:laboratory.aps in /path/to/xml/files/facility/laboratory.aps_1.1.xml
Found: urn:nasa:pds:context:target:planetary_system.solar_system in /path/to/xml/files/target/planetary_system.solar_system_1.0.xml
...
0: No duplicates found1: Duplicates found or error occurred
Format the code with Black:
cd operations
black bin/context/check_duplicate_identifiers.pyCheck code style with flake8:
cd operations
flake8 bin/context/check_duplicate_identifiers.pyRun mypy for type checking:
cd operations
mypy bin/context/check_duplicate_identifiers.pyRun the test suite:
cd operations
pytest test/context/test_check_duplicate_identifiers.py -v- File Discovery: Recursively finds all
.xmlfiles in the target directory - XML Parsing: Uses
xml.etree.ElementTreeto parse each XML file - Identifier Extraction: Looks for
logical_identifierelements in theIdentification_Areasection - Namespace Handling: Supports both namespaced and non-namespaced XML
- Duplicate Detection: Uses a
defaultdictto track which files contain each identifier - Reporting: Provides detailed output showing all duplicates and their locations
The script handles various error conditions gracefully:
- Missing or malformed XML files
- Files without
logical_identifierelements - Empty
logical_identifiervalues - Permission errors when reading files
The script expects XML files with this structure:
<?xml version="1.0" encoding="UTF-8"?>
<Product_Context xmlns="http://pds.nasa.gov/pds4/pds/v1">
<Identification_Area>
<logical_identifier>urn:nasa:pds:context:facility:laboratory.aps</logical_identifier>
<version_id>1.1</version_id>
<title>Argonne National Laboratory Advanced Photon Source</title>
<!-- ... other elements ... -->
</Identification_Area>
<!-- ... rest of document ... -->
</Product_Context>When contributing to these scripts:
- Follow PEP8 style guidelines
- Use Black for code formatting
- Add type hints to all functions
- Write tests for new functionality
- Update documentation as needed