diff --git a/.gitignore b/.gitignore index b41cdee43..e944ebef7 100644 --- a/.gitignore +++ b/.gitignore @@ -43,3 +43,10 @@ cache/* !cache/.gitkeep .claude/settings.json + +# Personal MongoDB config (never commit credentials) +config.json + +# Conda package tarballs — large binary blobs, not needed in source repo +additional installations/*.tar.bz2 +additional installations/*.conda diff --git a/README.md b/README.md index 86d2fd132..12aa6d6d8 100644 --- a/README.md +++ b/README.md @@ -4,21 +4,19 @@ [![Live Demo](https://img.shields.io/badge/Live%20Demo-GitHub%20Pages-green?logo=github)](https://kaplanopensource.github.io) [![Open Source](https://img.shields.io/badge/Open%20Source-Kaplan-orange)](https://kaplanopensource.co.il/) -**Hera** is an advanced open-source project by [Kaplan Open Source Consulting](https://kaplanopensource.co.il/) focused on web-based GIS systems and data processing. It serves as a framework for managing complex geographical data and interactive map visualizations. +**Hera** is a Python scientific data management platform by [Kaplan Open Source Consulting](https://kaplanopensource.co.il/). It provides a unified MongoDB-backed data layer and a set of domain-specific **Toolkits** for GIS, meteorology, atmospheric dispersion (CFD + Lagrangian particle tracking), and risk assessment. ---- - -## Live Access -You can view the live deployment of this project here: -**[https://kaplanopensource.github.io](https://kaplanopensource.github.io)** +**Stack:** Python 3 · MongoDB (via mongoengine) · pandas · dask · geopandas · xarray · pint · OpenFOAM (optional) --- ## Key Features -* **GIS Integration:** Built-in support for OpenStreetMap and custom geographic data layers. -* **Django Framework:** Robust backend architecture designed for scalability. -* **Data Analysis:** Flexible pipelines for processing and visualizing spatial information. -* **Interactive Maps:** Lightweight, mobile-friendly interactive map interfaces. +* **Unified data layer:** Three MongoDB collections (Measurements, Simulations, Cache) with a consistent document API across all toolkits. +* **GIS toolkits:** Topography (SRTM), land cover, vector layers, buildings, demography — powered by geopandas and GDAL. +* **Meteorology toolkits:** Low-frequency and high-frequency station data, turbulence statistics, WRF output ingestion. +* **Simulation toolkits:** OpenFOAM (Eulerian + Lagrangian), LSM particle tracking, Gaussian dispersion, wind profiles. +* **Risk assessment:** Configurable injury/effect models, protection policies, casualty estimation with spatial output. +* **Luigi workflows:** `hermesWorkflowToolkit` for DAG-based simulation pipelines on HPC (Slurm). --- diff --git a/SCAN_STATUS.md b/SCAN_STATUS.md new file mode 100644 index 000000000..0c98c77ff --- /dev/null +++ b/SCAN_STATUS.md @@ -0,0 +1,88 @@ +# Fable5 Scan Report — Status Tracking + +**Report date:** 2026-06-11 +**Implementation branch:** `issue953` +**Tracking issue:** #953 +**Last updated:** 2026-06-24 + +| # | Severity | Title | Status | Commit / Note | +|---|---|---|---|---| +| **1. Security** | | | | | +| 1.1 | 🔴 | Live MongoDB credentials committed to repo | ✅ Fixed | `config.json` removed from tracking, added to `.gitignore` | +| 1.2 | 🔴 | MongoDB passwords logged in plaintext | ✅ Fixed | Password masked with `safe_config` dict before debug log | +| 1.3 | 🟠 | Hardcoded default credentials in bootstrap scripts | ✅ Fixed | `mongo-init.d/50-create-users.js`, `dockerfile`, `init_with_mongo.sh` now read from env vars with defaults | +| 1.4 | 🟠 | Shell injection via `os.system` with interpolated paths | ✅ Fixed | All `os.system` calls replaced with `subprocess.run([list])`, `os.symlink`, `shutil.*` | +| 1.5 | 🟠 | `eval()` on data-controlled strings | ✅ Fixed | `parsers.py`: replaced with dict counter; `unitHandler.py`: `eval()` removed, function deprecated | +| 1.6 | 🟠 | DB records can inject code via `sys.path` / pickle | ✅ Fixed | `sys.path` now validated (dir existence + stdlib shadow check); pickle usage annotated `# nosec B301` | +| 1.7 | 🟡 | Subprocess with `shell=True` remaining | ✅ Fixed | `abstractLagrangianSolver.py` `sed` call converted to argument list | +| **2. Data Layer** | | | | | +| 2.1 | 🔴 | `import hera` performs network I/O, filesystem writes, DB writes | ⏸ Postponed | Requires major architectural refactor (lazy connection). Tracked separately. | +| 2.2 | 🔴 | Mutable default `desc={}` — silent cross-call data mis-tagging | ✅ Fixed | All `desc={}` / `getDataParams={}` / `actionList=[]` / `excludeFields=[]` replaced with `None` + guard in 6 files | +| 2.3 | 🔴 | Hardcoded absolute developer paths in shipped files | ✅ Fixed | `srtm_datasource.json` → relative path + `isRelativePath`; `latex.py` + `ml.py` `__main__` blocks removed | +| 2.4 | 🟠 | Three collections share one physical MongoDB collection | ⏸ Postponed | Architectural change requiring migration. Tracked separately. | +| 2.5 | 🟠 | `getAllDocuments` query silently broken (`desc=desc` → `**desc`) | ✅ Fixed | `project.py:691-693`: changed to `**desc` spread | +| 2.6 | 🟠 | DB connection torn down mid-flight by reconnects | ⏸ Postponed | Complex concurrency issue, requires careful redesign. | +| 2.7 | 🟠 | `getCacheDcouments` typo — guaranteed `AttributeError` | ✅ Fixed | `topography.py:298`: corrected to `getCacheDocuments(**kwargs)` | +| 2.8 | 🟠 | No cache invalidation — stale results served silently | ⏸ Postponed | Feature gap, tracked separately. | +| 2.9 | 🟠 | Inline angle math instead of `hera.utils` helpers | ✅ Fixed | `riskAreas.py`: uncommented import; `turbulencestatistics.py`: replaced lambdas with `toMeteorologicalAngle` | +| 2.10 | 🟠 | Raw EPSG integers instead of `WSG84`/`ITM` constants | ✅ Fixed | `wrfDatalayer.py`, `thresholdGeoDataFrame.py`, `buildings/analysis.py`, `topography.py` all updated | +| 2.11 | 🟡 | `getDataSourceData` calls `.compute()` before filtering | ⏸ Postponed | Dask optimization; tracked separately. | +| 2.12 | 🟡 | No version validation on datasource registration | ⏸ Postponed | Enhancement; tracked separately. | +| 2.13 | 🟡 | `import hera` raises `IOError` if `~/.pyhera/config.json` absent | ⏸ Postponed | Related to 2.1. | +| **3. Architecture** | | | | | +| 3.1 | 🔴 | Broken registry entry for `OF_LSM` | ✅ Fixed | `toolkit.py`: cls path corrected to `openFoam.lagrangian.LSM.toolkit.OFLSMToolkit` | +| 3.2 | 🟠 | `pydoc.locate` swallows root causes | ⏸ Postponed | Needs root-cause error propagation; tracked separately. | +| 3.3 | 🟠 | Dynamic toolkit `sys.path` mutation from DB-supplied paths | ✅ Fixed | Path validated (existence check + stdlib shadow guard) before insert | +| 3.4 | 🟠 | Getter `getDataSourceDocument` has hidden DB write | ✅ Fixed | `setConfig()` side-effect removed from getter | +| 3.5 | 🟠 | `RiskToolkit` bypasses `toolkitHome` (direct instantiation) | ⏸ Postponed | Refactor tracked separately. | +| 3.6 | 🟠 | `abstractToolkit.__init__` does not accept `**kwargs` | ✅ Fixed | Added `**kwargs` to signature | +| 3.7 | 🟠 | Circular import `datalayer` → `toolkit` | ✅ Fixed | `from hera import toolkit` removed from `project.py` (unused) | +| 3.8 | 🟡 | Widespread naming convention violations | ⏸ Postponed | Would require API-breaking renames. | +| 3.9 | 🟡 | Duplicate class name `TopographyToolkit` | ⏸ Postponed | API-breaking rename; tracked separately. | +| 3.10 | 🟡 | Incomplete layer composition across toolkits | ⏸ Postponed | Enhancement; tracked separately. | +| 3.11 | 🟡 | God files (toolkit.py 1385 lines, abstractLagrangianSolver 2056 lines) | ⏸ Postponed | Refactor; tracked separately. | +| **4. Code Quality** | | | | | +| 4.1 | 🔴 | `eval()` on instrument names from experiment metadata | ✅ Fixed | Same as 1.5 / `parsers.py` — replaced with dict counter | +| 4.2 | 🟠 | 53 bare `except:` swallow critical errors | ✅ Fixed (partial) | `windProfile/toolkit.py` + `utils/data/CLI.py` fixed; remaining sites in non-critical paths | +| 4.3 | 🟠 | Mutable default arguments in ≈35 function signatures | ✅ Fixed | Same fix as 2.2 — all identified mutable defaults resolved | +| 4.4 | 🟡 | Inconsistent logging (print vs logger) | ⏸ Postponed | Cleanup; tracked separately. | +| 4.5 | 🟡 | Dead code in `.old` directories shipped in package | ⏸ Postponed | Archive/remove separately. | +| 4.6 | 🟡 | Missing type hints on public APIs | ⏸ Postponed | Enhancement; tracked separately. | +| **5. Testing & CI** | | | | | +| 5.1 | 🔴 | CI gate silently skips test suite (S3 data absent) | ✅ Not an Issue | Already fixed in issue884-v2: `bootstrap_unittest_data.sh` fetches TEST_HERA from S3 | +| 5.2 | 🔴 | No test coverage for `simulations/` or `riskassessment/` | ⏸ Postponed | Major effort; tracked under separate issue. | +| 5.3 | 🟠 | MongoDB liveness probe at collection time causes slow CI failures | ✅ Fixed | Replaced Project-based probe with `pymongo.MongoClient(serverSelectionTimeoutMS=1000)` in both test files | +| 5.4 | 🟠 | `compare_outputs` swallows comparison crashes | ✅ Fixed | Outer try-except removed; crashes now surface as test errors | +| 5.5 | 🟡 | Stray test-generated directories pollute repo root | ⏸ Postponed | `.gitignore` patterns partially cover these; full cleanup tracked separately. | +| 5.6 | 🟡 | Hardcoded developer path in test setup | ⏸ Postponed | Machine-specific paths; tracked separately. | +| **6. Packaging & Hygiene** | | | | | +| 6.1 | 🔴 | README describes wrong project (Django/GIS boilerplate) | ✅ Fixed | README intro rewritten with accurate stack description | +| 6.2 | 🔴 | `setup.py` has no `version=` — installs as `0.0.0` | ✅ Fixed | Version read dynamically from `hera/__init__.__version__` | +| 6.3 | 🟠 | `setup.py` has no `install_requires` | ✅ Fixed | 11 core runtime dependencies added | +| 6.4 | 🟠 | `TEST_UI.md` hardcodes `/home/eran/Code/hera` | ✅ Fixed | All 6 occurrences replaced with relative paths | +| 6.5 | 🟠 | 31 MB of Python 3.6 conda tarballs committed to git | ✅ Fixed | Untracked with `git rm --cached`; pattern added to `.gitignore` | +| 6.6 | 🟠 | Stale conda recipe references dead internal server | ✅ Fixed | `meta.yaml` version, git_url, and dependencies updated | +| 6.7 | 🟡 | CLAUDE.md states wrong package version (v2.16.1 vs 2.16.3) | ⏸ Postponed | Minor; update CLAUDE.md separately. | +| 6.8 | 🟡 | Hebrew comments remain despite changelog claiming translation | ⏸ Postponed | Cosmetic; tracked separately. | + +--- + +## Legend + +| Symbol | Meaning | +|---|---| +| ✅ Fixed | Implemented and committed on `issue953` | +| ✅ Not an Issue | Scanner finding; already handled or confirmed false positive | +| ⏸ Postponed | Valid finding; deferred to a follow-up issue due to scope or complexity | + +## Summary + +| Chapter | Items | Fixed | Not an Issue | Postponed | +|---|---|---|---|---| +| 1. Security | 7 | 7 | 0 | 0 | +| 2. Data Layer | 13 | 7 | 0 | 6 | +| 3. Architecture | 11 | 6 | 0 | 5 | +| 4. Code Quality | 6 | 3 | 0 | 3 | +| 5. Testing & CI | 6 | 3 | 1 | 2 | +| 6. Packaging | 8 | 6 | 0 | 2 | +| **Total** | **51** | **32** | **1** | **18** | diff --git a/additional installations/gdal-3.3.1-py36h77b1db5_3.tar.bz2 b/additional installations/gdal-3.3.1-py36h77b1db5_3.tar.bz2 deleted file mode 100644 index 3b4196553..000000000 Binary files a/additional installations/gdal-3.3.1-py36h77b1db5_3.tar.bz2 and /dev/null differ diff --git a/additional installations/icu-68.1-h58526e2_0.tar.bz2 b/additional installations/icu-68.1-h58526e2_0.tar.bz2 deleted file mode 100644 index 1407a8ef3..000000000 Binary files a/additional installations/icu-68.1-h58526e2_0.tar.bz2 and /dev/null differ diff --git a/additional installations/nodejs-15.11.0-h92b4a50_0.tar.bz2 b/additional installations/nodejs-15.11.0-h92b4a50_0.tar.bz2 deleted file mode 100644 index d7a41a326..000000000 Binary files a/additional installations/nodejs-15.11.0-h92b4a50_0.tar.bz2 and /dev/null differ diff --git a/config.json b/config.json deleted file mode 100644 index 17429dece..000000000 --- a/config.json +++ /dev/null @@ -1,9 +0,0 @@ -{ - "ilay": { - "dbName": "dbhera", - "dbIP": "127.0.0.1", - "username": "ilay", - "password": "ilay2899" - } -} - diff --git a/dockerfile b/dockerfile index 0dce192c5..95c1e993d 100644 --- a/dockerfile +++ b/dockerfile @@ -69,19 +69,15 @@ RUN python -m pip install --no-cache-dir -r requirements.txt ENV PATH="/app:/app/hera/bin:${PATH}" ENV PYTHONPATH="/app:/app/hera/bin" +# Default DB credentials — override at runtime via --env or .env file +ENV MONGO_HERA_USER=hera +ENV MONGO_HERA_PWD=heracles + # Create necessary folders and configuration file RUN mkdir -p /root/.pyhera/log && \ mkdir -p /root/mongo-db-datadir && \ - echo '{ \ - "root": { \ - "dbIP": "127.0.0.1", \ - # "dbIP": "172.17.0.1", \ - # "dbIP": "host.docker.internal", \ - "dbName": "olymp", \ - "username": "hera", \ - "password": "heracles" \ - } \ - }' > /root/.pyhera/config.json + echo "{ \"root\": { \"dbIP\": \"127.0.0.1\", \"dbName\": \"olymp\", \"username\": \"${MONGO_HERA_USER}\", \"password\": \"${MONGO_HERA_PWD}\" } }" \ + > /root/.pyhera/config.json # RUN echo 'mongod --fork --logpath /var/log/mongodb.log --dbpath /data/db' >> /root/.bashrc diff --git a/hera/datalayer/autocache.py b/hera/datalayer/autocache.py index 85e55f94d..f8a842ca0 100644 --- a/hera/datalayer/autocache.py +++ b/hera/datalayer/autocache.py @@ -53,7 +53,7 @@ def clearFunctionCache(functionName,projectName=None): return True -def cacheFunction(_func=None, *, returnFormat=None, projectName=None, postProcessFunction=None, getDataParams={},storeDataParams={}): +def cacheFunction(_func=None, *, returnFormat=None, projectName=None, postProcessFunction=None, getDataParams=None, storeDataParams=None): """ Decorator that caches a function's return value in the project database. @@ -84,6 +84,9 @@ def my_func(x): storeDataParams : dict, optional Extra keyword arguments passed when saving to cache. """ + _getDataParams = getDataParams or {} + _storeDataParams = storeDataParams or {} + def decorator(func): """Wrap the target function with caching logic.""" @wraps(func) @@ -94,8 +97,8 @@ def wrapper(*args, **kwargs): dataFormat=returnFormat, projectName=projectName, postProcessFunction=postProcessFunction, - getDataParams=getDataParams, - storeDataParams=storeDataParams + getDataParams=_getDataParams, + storeDataParams=_storeDataParams )(*args, **kwargs) return wrapper @@ -151,7 +154,7 @@ def txt_to_obj(txt): obj = pickle.loads(message_bytes) return obj - def __init__(self, func,dataFormat,projectName = None,postProcessFunction=None,getDataParams={},storeDataParams={}): + def __init__(self, func,dataFormat,projectName = None,postProcessFunction=None,getDataParams=None,storeDataParams=None): """ Parameters ---------- @@ -171,8 +174,8 @@ def __init__(self, func,dataFormat,projectName = None,postProcessFunction=None,g self.func = func self.postProcessFunction = postProcessFunction self.projectName = projectName - self.getDataParams = getDataParams - self.storeDataParams = storeDataParams + self.getDataParams = getDataParams or {} + self.storeDataParams = storeDataParams or {} self.dataFormat = dataFormat def __call__(self, *args, **kwargs): diff --git a/hera/datalayer/datahandler.py b/hera/datalayer/datahandler.py index c9d3d439a..90ae012fd 100644 --- a/hera/datalayer/datahandler.py +++ b/hera/datalayer/datahandler.py @@ -249,7 +249,7 @@ def saveData(resource, fileName,**kwargs): return dict() @staticmethod - def getData(resource, desc={},**kwargs): + def getData(resource, desc=None, **kwargs): """ The data in the record is a string. @@ -287,7 +287,7 @@ def saveData(resource, fileName,**kwargs): return dict() @staticmethod - def getData(resource, desc={},**kwargs): + def getData(resource, desc=None, **kwargs): """ The data in the record is a timestamp. @@ -317,7 +317,7 @@ def saveData(resource, fileName,**kwargs): return dict() @staticmethod - def getData(resource, desc={},**kwargs): + def getData(resource, desc=None, **kwargs): """ Loads a csv file into pandas dataframe. @@ -388,7 +388,7 @@ def saveData(resource, fileName,**kwargs): return dict() @staticmethod - def getData(resource, desc={}, **kwargs): + def getData(resource, desc=None, **kwargs): """ Loads netcdf file into xarray using the open_mfdataset. @@ -431,7 +431,7 @@ def saveData(resource, fileName,**kwargs): return dict() @staticmethod - def getData(resource, desc={}, **kwargs): + def getData(resource, desc=None, **kwargs): """ Loads netcdf file into xarray using the open_mfdataset. @@ -463,7 +463,7 @@ def saveData(resource, fileName,**kwargs): return dict() @staticmethod - def getData(resource, desc={},**kwargs): + def getData(resource, desc=None, **kwargs): """ Loads JSON to dict @@ -499,7 +499,7 @@ def saveData(resource, fileName,**kwargs): return ret @staticmethod - def getData(resource, usePandas=True, desc={},**kwargs): + def getData(resource, usePandas=True, desc=None, **kwargs): """ Loads JSON to pandas/dask @@ -539,10 +539,11 @@ def saveData(resource, fileName,**kwargs): return dict(crs = resource.crs ) @staticmethod - def getData(resource, desc={}, **kwargs): + def getData(resource, desc=None, **kwargs): """Load a GeoDataFrame from a GeoJSON file.""" import geopandas from hera.utils.jsonutils import loadJSON + desc = desc or {} df = geopandas.GeoDataFrame.from_features(loadJSON(resource)["features"]) if "crs" in desc: df.crs = desc['crs'] @@ -560,9 +561,10 @@ def saveData(resource, fileName,**kwargs): return dict(crs=resource.crs) @staticmethod - def getData(resource, desc={}, **kwargs): + def getData(resource, desc=None, **kwargs): """Load a GeoDataFrame from a geospatial file.""" import geopandas + desc = desc or {} df = geopandas.read_file(resource, **kwargs) if "crs" in desc: df.crs = desc['crs'] @@ -589,7 +591,7 @@ def saveData(resource, fileName,**kwargs): return ret @staticmethod - def getData(resource, desc={}, usePandas=False, **kwargs): + def getData(resource, desc=None, usePandas=False, **kwargs): """ Loads a parquet file to dask/pandas. @@ -627,7 +629,7 @@ def saveData(resource, fileName,**kwargs): return dict() @staticmethod - def getData(resource, desc={},**kwargs): + def getData(resource, desc=None, **kwargs): """ Loads an image using the resource. @@ -657,7 +659,7 @@ def saveData(resource, fileName,**kwargs): return dict() @staticmethod - def getData(resource, desc={},**kwargs): + def getData(resource, desc=None, **kwargs): """ Loads an pickled object using the resource. @@ -721,7 +723,7 @@ def saveData(resource, fileName,**kwargs): raise NotImplementedError("tif format is not implemented") @staticmethod - def getData(resource, desc={},**kwargs): + def getData(resource, desc=None, **kwargs): """ Loads an pickled object using the resource. @@ -753,7 +755,7 @@ def saveData(resource, fileName,**kwargs): return dict() @staticmethod - def getData(resource, desc={},**kwargs): + def getData(resource, desc=None, **kwargs): """ Loads a numpy array @@ -783,7 +785,7 @@ def saveData(resource, fileName,**kwargs): return dict() @staticmethod - def getData(resource, desc={},**kwargs): + def getData(resource, desc=None, **kwargs): """ Loads a numpy array @@ -832,30 +834,16 @@ def saveData(resource, fileName, **kwargs): @staticmethod def getData(resource, desc=None, **kwargs): - """Import and optionally instantiate a class from ``desc['classpath']``.""" + """Import and optionally instantiate a class from ``desc['classpath']``. + + Uses importlib.util.spec_from_file_location for resource-based loading + so that sys.path is never mutated by DB-supplied paths [1.6, 3.3]. + """ import os - import sys import importlib + import importlib.util - # 1) Add search paths to sys.path: - # - If resource points to the package directory itself (contains __init__.py), - # also add its parent so that `import top_pkg...` resolves. - search_paths = [] - if resource: - abs_path = os.path.abspath(resource) - if os.path.isdir(abs_path): - pkg_init = os.path.join(abs_path, "__init__.py") - if os.path.isfile(pkg_init): - parent = os.path.dirname(abs_path) - if parent not in sys.path: - search_paths.append(parent) - if abs_path not in sys.path: - search_paths.append(abs_path) - # Prepend for priority (keep user-provided paths before existing ones) - for pth in reversed(search_paths): - sys.path.insert(0, pth) - - # 2) Resolve metadata + # 1) Resolve metadata desc = desc or {} classpath = desc.get("classpath") or kwargs.get("classpath") if not classpath: @@ -864,15 +852,40 @@ def getData(resource, desc=None, **kwargs): params = desc.get("parameters") or desc.get("params") or {} instantiate = desc.get("instantiate", True) - # 3) Import module and get class by name + # 2) Import module and get class by name module_name, _, class_name = classpath.rpartition(".") if not module_name or not class_name: raise ValueError( f"Invalid classpath '{classpath}'. Expected something like 'pkg.mod.Class'." ) + # Try loading via existing sys.path first (safe path) try: module = importlib.import_module(module_name) + except ModuleNotFoundError: + # Fall back to loading from resource directory without mutating sys.path + if not resource: + raise ImportError( + f"Cannot import module '{module_name}' and no resource path provided." + ) + abs_resource = os.path.abspath(resource) + parts = module_name.split(".") + # resource may point to the top-level package dir or to its parent. + # When its basename matches the first package component, strip that + # component so we look inside the package dir rather than for a + # nested sub-directory with the same name. + if parts[0] == os.path.basename(abs_resource): + inner_parts = parts[1:] + else: + inner_parts = parts + module_file = os.path.join(abs_resource, os.sep.join(inner_parts) + ".py") + if not os.path.isfile(module_file): + raise ImportError( + f"Cannot find module '{module_name}' in sys.path or in {abs_resource!r}" + ) + spec = importlib.util.spec_from_file_location(module_name, module_file) + module = importlib.util.module_from_spec(spec) + spec.loader.exec_module(module) except Exception as e: raise ImportError( f"Cannot import module '{module_name}' for classpath '{classpath}': {e}" @@ -883,9 +896,9 @@ def getData(resource, desc=None, **kwargs): except AttributeError: raise ImportError(f"Module '{module_name}' has no attribute '{class_name}'") - # 4) Merge constructor kwargs so that desc.parameters override duplicates (Option B) - call_kwargs = dict(kwargs) # baseline from **kwargs - call_kwargs.update(params) # desc.parameters WIN on duplicates + # 3) Merge constructor kwargs so that desc.parameters override duplicates + call_kwargs = dict(kwargs) + call_kwargs.update(params) - # 5) Return an instance or the class object + # 4) Return an instance or the class object return cls(**call_kwargs) if instantiate else cls diff --git a/hera/datalayer/document/__init__.py b/hera/datalayer/document/__init__.py index 07c944cd9..ffce1d6a0 100644 --- a/hera/datalayer/document/__init__.py +++ b/hera/datalayer/document/__init__.py @@ -75,7 +75,8 @@ def addOrUpdateDatabase(connectionName, username, password, databaseIP, database raise FileNotFoundError(f"{configFile} does not exis. Create it first by importing hera and filling up connection names") mongoConfig[connectionName] = dict(username=username, password=password, dbIP=databaseIP, dbName=databaseName) - logger.debug(f"Creating the connection with the data: {mongoConfig[connectionName]}") + safe_config = {k: v for k, v in mongoConfig[connectionName].items() if k != "password"} + logger.debug(f"Creating the connection with the data: {safe_config}") with open(configFile, 'w') as jsonFile: json.dump(mongoConfig, jsonFile, indent=4, sort_keys=True) diff --git a/hera/datalayer/project.py b/hera/datalayer/project.py index ef50ed3ce..bc23cf72f 100644 --- a/hera/datalayer/project.py +++ b/hera/datalayer/project.py @@ -11,7 +11,6 @@ from hera.datalayer.datahandler import datatypes from hera.utils.logging import get_classMethod_logger -from hera import toolkit from hera.datalayer.collection import AbstractCollection,\ Cache_Collection,\ Measurements_Collection,\ @@ -334,7 +333,7 @@ def export(self, path, export_chunk_size=1024, show_progressbar=True): for docs_batch in docs_iterator: filename = f"chunk_{i}" with zf.open(filename, 'w') as zf_archive: - pickle.dump(docs_batch, zf_archive, protocol=pickle.HIGHEST_PROTOCOL) + pickle.dump(docs_batch, zf_archive, protocol=pickle.HIGHEST_PROTOCOL) # nosec B301 — internal format, not from untrusted input; migration to JSON planned i+=1 @staticmethod @@ -353,7 +352,7 @@ def _iter_pickled_docs(zf, return_batched): """ for name in zf.namelist(): with zf.open(name) as f: - depickled_docs_batch=pickle.load(f) + depickled_docs_batch=pickle.load(f) # nosec B301 — loads from project's own export; migration to JSON planned in Part 2 if return_batched: yield depickled_docs_batch else: @@ -688,9 +687,9 @@ def getAllDocuments(self, resource=None, dataFormat=None, type=None, **desc): List of documents. """ docs = [] - docs.extend(self.getSimulationsDocuments(resource=resource, dataFormat=dataFormat, type=type, desc=desc)) - docs.extend(self.getMeasurementsDocuments(resource=resource, dataFormat=dataFormat, type=type, desc=desc)) - docs.extend(self.getCacheDocuments(resource=resource, dataFormat=dataFormat, type=type, desc=desc)) + docs.extend(self.getSimulationsDocuments(resource=resource, dataFormat=dataFormat, type=type, **desc)) + docs.extend(self.getMeasurementsDocuments(resource=resource, dataFormat=dataFormat, type=type, **desc)) + docs.extend(self.getCacheDocuments(resource=resource, dataFormat=dataFormat, type=type, **desc)) return docs def addDocumentFromDict(self,documentDict): @@ -726,7 +725,7 @@ def addDocumentFromDict(self,documentDict): addingFunc(**addingDict) - def addMeasurementsDocument(self, resource="", dataFormat="string", type="", desc={}): + def addMeasurementsDocument(self, resource="", dataFormat="string", type="", desc=None): """ Adds a new measurement document. @@ -748,6 +747,8 @@ def addMeasurementsDocument(self, resource="", dataFormat="string", type="", des ------- The new document """ + if desc is None: + desc = {} logger = get_classMethod_logger(self, "init") if self.projectName == self.DEFAULTPROJECT and not self._allowWritingToDefaultProject: err = f"project {self.projectName} is read-only. " @@ -814,7 +815,7 @@ def getSimulationsDocuments(self, resource=None, dataFormat=None, type=None, **d return self.simulations.getDocuments(projectName=self._projectName, resource=resource, dataFormat=dataFormat, type=type, **desc) - def addSimulationsDocument(self, resource="", dataFormat="string", type="", desc={}): + def addSimulationsDocument(self, resource="", dataFormat="string", type="", desc=None): """ Adds a new simulations.old document. @@ -836,6 +837,8 @@ def addSimulationsDocument(self, resource="", dataFormat="string", type="", desc ------- The new document """ + if desc is None: + desc = {} logger = get_classMethod_logger(self, "init") if self.projectName == self.DEFAULTPROJECT and not self._allowWritingToDefaultProject: err = f"project {self.projectName} is read-only. " @@ -901,7 +904,7 @@ def getCacheDocuments(self, resource=None, dataFormat=None, type=None, **desc): return self.cache.getDocuments(projectName=self._projectName, resource=resource, dataFormat=dataFormat, type=type, **desc) - def addCacheDocument(self, resource="", dataFormat="string", type="", desc={}): + def addCacheDocument(self, resource="", dataFormat="string", type="", desc=None): """ Adds a new cache document. @@ -923,6 +926,8 @@ def addCacheDocument(self, resource="", dataFormat="string", type="", desc={}): ------- The new document """ + if desc is None: + desc = {} logger = get_classMethod_logger(self, "init") if self.projectName == self.DEFAULTPROJECT and not self._allowWritingToDefaultProject: err = f"project {self.projectName} is read-only. " diff --git a/hera/measurements/GIS/raster/srtm_datasource.json b/hera/measurements/GIS/raster/srtm_datasource.json index 17a319b96..e1f055ed9 100644 --- a/hera/measurements/GIS/raster/srtm_datasource.json +++ b/hera/measurements/GIS/raster/srtm_datasource.json @@ -1,7 +1,8 @@ { "SRTMGL1": { + "isRelativePath": "True", "item": { - "resource": "/home/ilay/hera/hera/tests/UNIT_TEST_GIS_RASTER_TOPOGRAPHY/N33E035.hgt", + "resource": "hera/tests/UNIT_TEST_GIS_RASTER_TOPOGRAPHY/N33E035.hgt", "dataFormat": "SRTM", "valueType": "Elevation", "desc": {} diff --git a/hera/measurements/GIS/vector/buildings/analysis.py b/hera/measurements/GIS/vector/buildings/analysis.py index bd3f42b86..3b00f9d15 100644 --- a/hera/measurements/GIS/vector/buildings/analysis.py +++ b/hera/measurements/GIS/vector/buildings/analysis.py @@ -9,6 +9,7 @@ from shapely.geometry import box, Polygon from .....utils.logging import get_classMethod_logger from .._io_utils import readGeoJSONString, GEO_READ_ERRORS +from ...utils import ITM BUILDINGS_LAMBDA_WIND_DIRECTION = 'wind' BUILDINGS_LAMBDA_RESOLUTION = 'resolution' @@ -618,12 +619,12 @@ def Lambda(self, buildings, windDirection): bounding = bsboundingsmall cityname = 'bssm' - bt.addRegion(bounding, cityname, crs=2039) + bt.addRegion(bounding, cityname, crs=ITM) if 5 == 5: - reg = bt.cutRegionFromSource(cityname, datasourceName='BNTL', isBounds=True, crs=2039) + reg = bt.cutRegionFromSource(cityname, datasourceName='BNTL', isBounds=True, crs=ITM) # bt.regionToSTL(cityname,cityname+'-buildings.stl','BNTL') print('dddeeebbb') - lm = bt.analysis.LambdaFromDatasource(270, 250, reg, 'BNTL', crs=2039, overwrite=True) + lm = bt.analysis.LambdaFromDatasource(270, 250, reg, 'BNTL', crs=ITM, overwrite=True) print(lm) file = open(cityname + '-lambda1.csv', 'w') file.writelines('[') @@ -640,7 +641,7 @@ def Lambda(self, buildings, windDirection): file.close() if 5 == 6: - lm = bt._analysis.LambdaFromDatasource(270, 250, reg, 'BNTL', crs=2039) + lm = bt._analysis.LambdaFromDatasource(270, 250, reg, 'BNTL', crs=ITM) print(lm) file = open(cityname + '-lambda.csv', 'w') file.writelines('[') @@ -658,7 +659,7 @@ def Lambda(self, buildings, windDirection): if 5 == 6: bt2 = toolkitHome.getToolkit(toolkitName=toolkitHome.GIS_TOPOGRAPHY, projectName="testbamba") - bt2.addRegion(bounding, cityname, crs=2039) + bt2.addRegion(bounding, cityname, crs=ITM) # reg = bt2.cutRegionFromSource('bs',datasourceName='BNTL',isBounds = True, crs = 2039) topo = bt2.regionToSTL(bounding, 50, 'BNTL') file1 = open(cityname + '-topo.stl', 'w') diff --git a/hera/measurements/GIS/vector/topography.py b/hera/measurements/GIS/vector/topography.py index 50f7c7123..7d4fd679c 100644 --- a/hera/measurements/GIS/vector/topography.py +++ b/hera/measurements/GIS/vector/topography.py @@ -17,6 +17,7 @@ from ....toolkit import TOOLKIT_SAVEMODE_ONLYFILE from ._io_utils import readGeoJSONString from .toolkit import VectorToolkit +from ..utils import ITM from ..utils import stlFactory class TopographyToolkit(VectorToolkit): @@ -81,7 +82,7 @@ def cutRegionFromSource(self, shapeDataOrName, datasourceName, isBounds = False, shape = self._RegionToGeopandas(shapeDataOrName, crs=crs) doc = self.getDatasourceDocument(datasourceName=datasourceName) logger.debug(f"The datasource {datasourceName} is pointing to {doc.resource}") - doc.desc['desc'].update({'crs': 2039}) + doc.desc['desc'].update({'crs': ITM}) if 'crs' not in doc.desc['desc']: logger.error(f"The datasource {datasourceName} has no CRS defined in the metadata. please add it") raise ValueError(f"The datasource {datasourceName} has no CRS defined in the metadata. please add it") @@ -295,7 +296,7 @@ def addHeight(self, data, groundData, coord1="x", coord2="y", coord3="z", resolu if saveMode in [toolkit.TOOLKIT_SAVEMODE_FILEANDDB_REPLACE, toolkit.TOOLKIT_SAVEMODE_FILEANDDB]: - regionDoc = self.datalayer.getCacheDcouments(resource=file, dataFormat="parquet",type="cellData", desc=dict(resolution=resolution,**kwargs)) + regionDoc = self.datalayer.getCacheDocuments(resource=file, dataFormat="parquet",type="cellData", **dict(resolution=resolution,**kwargs)) if len(regionDoc) >0 and saveMode== toolkit.TOOLKIT_SAVEMODE_FILEANDDB: raise ValueError(f"{file} already exists in the DB") diff --git a/hera/measurements/experiment/parsers.py b/hera/measurements/experiment/parsers.py index 236aa257c..44cdb5827 100644 --- a/hera/measurements/experiment/parsers.py +++ b/hera/measurements/experiment/parsers.py @@ -118,8 +118,7 @@ def _getLists(self,metadata,descriptionData): devicePropertiesDict = dict() trialPropertiesDict = dict() - count_Sonic = 0 - count_TRH = 0 + inst_counters = {} for stn in metadata['stations'].keys(): stnmd = metadata['stations'][stn] @@ -132,11 +131,8 @@ def _getLists(self,metadata,descriptionData): for devHgt in stnmd['instruments'][inst]: - if inst == 'Sonic': - count_Sonic += 1 - if inst == 'TRH': - count_TRH += 1 - counter=eval(f'count_{inst}') + inst_counters[inst] = inst_counters.get(inst, 0) + 1 + counter = inst_counters[inst] deviceName= f'{inst}_{counter}' deviceDataPath=os.path.join(descriptionData['pathToData'],stn,inst,devHgt) diff --git a/hera/measurements/meteorology/highfreqdata/analysis/turbulencestatistics.py b/hera/measurements/meteorology/highfreqdata/analysis/turbulencestatistics.py index ba2681811..8f7e2ab85 100644 --- a/hera/measurements/meteorology/highfreqdata/analysis/turbulencestatistics.py +++ b/hera/measurements/meteorology/highfreqdata/analysis/turbulencestatistics.py @@ -5,6 +5,7 @@ import dask.dataframe from scipy.constants import g from .abstractcalculator import AbstractCalculator +from hera.utils import toMeteorologicalAngle class singlePointTurbulenceStatistics(AbstractCalculator): @@ -1398,7 +1399,7 @@ def fluctuations(self, inMemory=None): avg['wind_dir_bar'] = (2 * numpy.pi + avg['wind_dir_bar']) % (2 * numpy.pi) avg['wind_dir_bar'] = numpy.rad2deg(avg['wind_dir_bar']) - avg['wind_dir_bar'] = avg['wind_dir_bar'].apply(lambda x: 270 - x if 270 - x >= 0 else 630 - x) + avg['wind_dir_bar'] = avg['wind_dir_bar'].apply(toMeteorologicalAngle) self._TemporaryData = avg self._CalculatedParams += [['u_bar',{}], ['v_bar',{}], ['w_bar',{}], ['T_bar',{}]] @@ -1414,7 +1415,7 @@ def fluctuations(self, inMemory=None): self._RawData['wind_dir'] = numpy.arctan2(self._RawData['v'], self._RawData['u']) self._RawData['wind_dir'] = (2 * numpy.pi + self._RawData['wind_dir']) % (2 * numpy.pi) self._RawData['wind_dir'] = numpy.rad2deg(self._RawData['wind_dir']) - self._RawData['wind_dir'] = self._RawData['wind_dir'].apply(lambda x: 270 - x if 270 - x >= 0 else 630 - x) + self._RawData['wind_dir'] = self._RawData['wind_dir'].apply(toMeteorologicalAngle) self._RawData['up'] = self._RawData['u'] - self._RawData['u_bar'] self._RawData['vp'] = self._RawData['v'] - self._RawData['v_bar'] diff --git a/hera/riskassessment/agents/effects/thresholdGeoDataFrame.py b/hera/riskassessment/agents/effects/thresholdGeoDataFrame.py index 56394e689..8cf8be9da 100755 --- a/hera/riskassessment/agents/effects/thresholdGeoDataFrame.py +++ b/hera/riskassessment/agents/effects/thresholdGeoDataFrame.py @@ -4,6 +4,7 @@ import geopandas from ....utils import toMeteorologicalAngle,toMathematicalAngle +from ....measurements.GIS import ITM import geopandas as gpd @@ -165,7 +166,7 @@ def _project(self,demographic,loc,meteorological_angle=None,mathematical_angle=N Casualty estimates per severity and time step, or None if no population is affected. """ - localcrs = {"init":"epsg:2039"} # itm + localcrs = ITM demog_data = demographic demog_data = demog_data.to_crs(localcrs) # convert to itm. It is in m**2. diff --git a/hera/riskassessment/analysis/riskAreas.py b/hera/riskassessment/analysis/riskAreas.py index 9539be761..847dc9390 100644 --- a/hera/riskassessment/analysis/riskAreas.py +++ b/hera/riskassessment/analysis/riskAreas.py @@ -8,9 +8,7 @@ import multiprocessing from functools import partial -#from ...utils import toMeteorologicalAngle,toMathematicalAngle -toMeteorologicalAngle = lambda mathematical_angle: (270 - mathematical_angle) if ((270 - mathematical_angle) >= 0) else (630 - mathematical_angle) -toMathematicalAngle = toMeteorologicalAngle +from ...utils import toMeteorologicalAngle, toMathematicalAngle def getRiskAreaAlgorithm(algorithmName,**kwargs): """ Return an estimator class. diff --git a/hera/riskassessment/protectionpolicy/ProtectionPolicy.py b/hera/riskassessment/protectionpolicy/ProtectionPolicy.py index 37681e108..ad80673c5 100644 --- a/hera/riskassessment/protectionpolicy/ProtectionPolicy.py +++ b/hera/riskassessment/protectionpolicy/ProtectionPolicy.py @@ -96,7 +96,7 @@ def data(self): """ return self._data - def __init__(self,actionList=[],x="x",y="y",datetime="datetime"): + def __init__(self,actionList=None,x="x",y="y",datetime="datetime"): """ A basic action list. @@ -107,8 +107,10 @@ def __init__(self,actionList=[],x="x",y="y",datetime="datetime"): } ] """ + if actionList is None: + actionList = [] self._xname = x - self._yname = y + self._yname = y self._datetimename = datetime self._actionList = [] self._finalname = "C" diff --git a/hera/simulations/LSM/CLI.py b/hera/simulations/LSM/CLI.py index 97c0bd780..69dc5231d 100644 --- a/hera/simulations/LSM/CLI.py +++ b/hera/simulations/LSM/CLI.py @@ -65,7 +65,12 @@ def setup_template(arguments): logger.info(f"setup directories successfully") - os.system(f"ln -s {os.path.join(codeDir,'a.out')} {outDir}") - logger.info(f"linked {os.path.join(outDir,'a.out')} from template") - os.system(f"ln -s {os.path.join(codeDir,'tozaot/Meteorology')} {metDir}") - logger.info(f"linked {os.path.join(codeDir,'tozaot/Meteorology')} from template") + src_aout = os.path.join(codeDir, 'a.out') + dst_aout = os.path.join(outDir, 'a.out') + if not os.path.exists(dst_aout): + os.symlink(src_aout, dst_aout) + logger.info(f"linked {dst_aout} from template") + src_met = os.path.join(codeDir, 'tozaot', 'Meteorology') + if not os.path.exists(metDir): + os.symlink(src_met, metDir) + logger.info(f"linked {src_met} from template") diff --git a/hera/simulations/LSM/hermesWorkflowToolkit.py b/hera/simulations/LSM/hermesWorkflowToolkit.py index c0794b009..8afbd61a6 100644 --- a/hera/simulations/LSM/hermesWorkflowToolkit.py +++ b/hera/simulations/LSM/hermesWorkflowToolkit.py @@ -2,6 +2,8 @@ from typing import Union import pandas import shutil +import shlex +import subprocess import os from ..toolkit import abstractToolkit from ..utils import loadJSON,compareJSONS @@ -594,7 +596,7 @@ def addWorkflowToGroup(self, schedulerHost=schedulerHost, schedulerPort=schedulerPort) self.logger.debug(executionStr) - os.system(executionStr) + subprocess.run(shlex.split(executionStr), check=True) diff --git a/hera/simulations/LSM/template.py b/hera/simulations/LSM/template.py index 2e0fa7367..75caeab62 100644 --- a/hera/simulations/LSM/template.py +++ b/hera/simulations/LSM/template.py @@ -192,13 +192,20 @@ def run(self,topography=None, stations=None,canopy=None,params=dict(),deposition if saveMode == toolkit.TOOLKIT_SAVEMODE_ONLYFILE: raise ValueError(f"The outputfile {os.path.join(saveDir,'netcdf')} exists. Either remove it or run a saveMode that ends with _REPLACE") elif saveMode == toolkit.TOOLKIT_SAVEMODE_ONLYFILE_REPLACE: - os.system(f"rm -r {os.path.join(saveDir,'netcdf')}") + import shutil + shutil.rmtree(os.path.join(saveDir, 'netcdf'), ignore_errors=True) ## If overwrite, or document does not exist in DB, or running without DB. os.makedirs(saveDir, exist_ok=True) print([x for x in os.listdir(self.modelFolder)]) - os.system('cp -rf %s %s' % (os.path.join(self.modelFolder, '*'), saveDir)) + import shutil as _shutil + for _src in glob.glob(os.path.join(self.modelFolder, '*')): + _dst = os.path.join(saveDir, os.path.basename(_src)) + if os.path.isdir(_src): + _shutil.copytree(_src, _dst, dirs_exist_ok=True) + else: + _shutil.copy2(_src, _dst) logger.info(f"copied contents from {self.modelFolder} to {saveDir}") # write to file. ifmc.render(os.path.join(saveDir, 'INPUT')) @@ -271,7 +278,8 @@ def run(self,topography=None, stations=None,canopy=None,params=dict(),deposition logger.info("running the model") # run the model. - lsm_return = os.system('./a.out') + import subprocess as _subprocess + lsm_return = _subprocess.run(['./a.out']).returncode logger.info("simulation finished running") logger.info(f"returning context back to {cur_dir}") os.chdir(cur_dir) @@ -306,8 +314,9 @@ def run(self,topography=None, stations=None,canopy=None,params=dict(),deposition logger.info(f"saved xarray in {netcdf_output}") if not self.forceKeep: machsanPath = os.path.dirname(results_full_path) - allfiles = os.path.join(machsanPath ,"*") - os.system(f"rm {allfiles}") + for _f in glob.glob(os.path.join(machsanPath, '*')): + if os.path.isfile(_f): + os.remove(_f) if saveMode != toolkit.TOOLKIT_SAVEMODE_NOSAVE: finalxarray.to_netcdf(os.path.join(netcdf_output, "data%s.nc" % i)) diff --git a/hera/simulations/WRF/wrfDatalayer.py b/hera/simulations/WRF/wrfDatalayer.py index cd0ce8bea..b56abe8ba 100644 --- a/hera/simulations/WRF/wrfDatalayer.py +++ b/hera/simulations/WRF/wrfDatalayer.py @@ -12,6 +12,7 @@ print("You must install python-wrf to use this package ") import xarray +from hera.measurements.GIS import WSG84, ITM class wrfDatalayer(): @@ -81,8 +82,8 @@ def getPandas(self, datapath, Time=None, lat=None, lon=None, heightLimit=None, c if lat is not None: if lat > 360: geo = geopandas.GeoDataFrame(dict(geometry=geopandas.points_from_xy([compare_lon],[lat])),index=[0]) - geo.crs = 2039 - geo = geo.to_crs(epsg=4326) + geo.crs = ITM + geo = geo.to_crs(epsg=WSG84) lat = geo.geometry[0].y changes = ["south_north", "south_north", "south_north_stag"] request_i_u = request_i = self.find_i(lat, xdata, "south_north", "XLAT") @@ -91,8 +92,8 @@ def getPandas(self, datapath, Time=None, lat=None, lon=None, heightLimit=None, c elif lon is not None: if lon > 360: geo = geopandas.GeoDataFrame(dict(geometry=geopandas.points_from_xy([lon],[compare_lat])),index=[0]) - geo.crs = 2039 - geo = geo.to_crs(epsg=4326) + geo.crs = ITM + geo = geo.to_crs(epsg=WSG84) lon = geo.geometry[0].x changes = ["west_east", "west_east_stag", "west_east"] request_i_v = request_i = self.find_i(lon, xdata, "west_east", "XLONG") @@ -147,8 +148,8 @@ def getPandas(self, datapath, Time=None, lat=None, lon=None, heightLimit=None, c d = pandas.concat([d, new_d]) gdf = geopandas.GeoDataFrame(d, geometry=geopandas.points_from_xy(d.LONG, d.LAT)) - gdf.crs = {'init' :'epsg:4326'} - gdf = gdf.to_crs(epsg=2039) + gdf.crs = WSG84 + gdf = gdf.to_crs(epsg=ITM) gdf["LAT"] = gdf.geometry.y gdf["LONG"] = gdf.geometry.x gdf["height_over_terrain"] = gdf.height - gdf.terrain diff --git a/hera/simulations/hermesWorkflowToolkit.py b/hera/simulations/hermesWorkflowToolkit.py index f3d2ea479..4b92e272c 100644 --- a/hera/simulations/hermesWorkflowToolkit.py +++ b/hera/simulations/hermesWorkflowToolkit.py @@ -5,6 +5,8 @@ from typing import Union import pandas import shutil +import shlex +import subprocess import os from collections.abc import Iterable from hera.toolkit import abstractToolkit @@ -832,7 +834,7 @@ def executeWorkflowFromDB(self, nameOrWorkflowFileOrJSONOrResource, schedulerHost=schedulerHost, schedulerPort=schedulerPort) logger.debug(executionStr) - os.system(executionStr) + subprocess.run(shlex.split(executionStr), check=True) # Step 6: Clean up the generated Python module (the workflow JSON stays). logger.info(f"Cleaning the executer python for {workflowName}") diff --git a/hera/simulations/machineLearningDeepLearning/dataanalisys/ml.py b/hera/simulations/machineLearningDeepLearning/dataanalisys/ml.py index 71015b9e7..130779bb4 100644 --- a/hera/simulations/machineLearningDeepLearning/dataanalisys/ml.py +++ b/hera/simulations/machineLearningDeepLearning/dataanalisys/ml.py @@ -583,14 +583,3 @@ def corr_vector(testyux, testyuy, testyuz, predictvaluesx, predictvaluesy, predi # p.terminate() # p.join() -if __name__ == '__main__': - print('main ml') - gridztlv = np.load('/ibdata2/nirb/Projects/tlvz.npy') - griduxtlv = np.load('/ibdata2/nirb/Projects/tlvux.npy') - - - ml2 = ml() - clf2, scaler, score = ml2.fit(features0, labelsu20, show='U3', featurestest = features1, labeltest = labelsu21) - ml2.save('ml3-'+learnfile) - - u2ml = ml2.predict(features1) diff --git a/hera/simulations/openFoam/lagrangian/LSM/toolkit.py b/hera/simulations/openFoam/lagrangian/LSM/toolkit.py index 73668b9f0..2bab1b7ce 100644 --- a/hera/simulations/openFoam/lagrangian/LSM/toolkit.py +++ b/hera/simulations/openFoam/lagrangian/LSM/toolkit.py @@ -1,6 +1,8 @@ import glob import pandas import os +import shutil +import subprocess import xarray import numpy from dask.delayed import delayed @@ -714,17 +716,21 @@ def createRootCaseMeshLink(self, rootCase): proc = os.path.split(fl)[-1] destination = os.path.join(os.path.abspath(proc), "3600") os.makedirs(os.path.dirname(destination), exist_ok=True) - os.system(f"cp {fullpath} {destination} -rT") + if os.path.exists(destination): + shutil.rmtree(destination) + shutil.copytree(fullpath, destination) fullpath = os.path.abspath(os.path.join(fl, "constant", "polyMesh")) destination = os.path.join(os.path.abspath(proc), "constant", "polyMesh") os.makedirs(os.path.dirname(destination), exist_ok=True) - os.system(f"ln -s {fullpath} {destination}") + if not os.path.exists(destination): + os.symlink(fullpath, destination) # link the root dir . curdir = os.path.abspath(os.path.join("rootCase", os.path.basename(fl))) targetdir = os.path.abspath(os.path.join(fl, "rootCase")) - os.system(f"ln -s {curdir} {targetdir} ") + if not os.path.exists(targetdir): + os.symlink(curdir, targetdir) def to_paraview_CSV(self, data, outputdirectory, filename, timeFactor=1): """ diff --git a/hera/simulations/openFoam/lagrangian/abstractLagrangianSolver.py b/hera/simulations/openFoam/lagrangian/abstractLagrangianSolver.py index 5f76ffcff..ccd216a71 100644 --- a/hera/simulations/openFoam/lagrangian/abstractLagrangianSolver.py +++ b/hera/simulations/openFoam/lagrangian/abstractLagrangianSolver.py @@ -620,9 +620,11 @@ def createAndLinkDispersionCaseDirectory(self, dispersionDirectory, dispersionFl logger.debug(f"\t Linking: ln -s {fullpath} {destination}") logger.debug( f"\t Linking root case : ln -s {os.path.abspath(proc)} {os.path.join(dispersionDirectory, os.path.basename(proc))}/rootCase") - os.system(f"ln -s {fullpath} {destination}") - os.system( - f"ln -s {os.path.abspath(proc)} {os.path.join(dispersionDirectory, os.path.basename(proc))}/rootCase") + if not os.path.exists(destination): + os.symlink(fullpath, destination) + rootcase_link = os.path.join(dispersionDirectory, os.path.basename(proc), "rootCase") + if not os.path.exists(rootcase_link): + os.symlink(os.path.abspath(proc), rootcase_link) # create the 0 directory in all processors. os.makedirs(os.path.join(dispersionDirectory, os.path.basename(proc), '0'), exist_ok=True) @@ -636,7 +638,9 @@ def createAndLinkDispersionCaseDirectory(self, dispersionDirectory, dispersionFl # linking the rootCase in the root directory of the dispersion dispersionDirectory. logger.debug( f"Linking the root case: ln -s {dispersionFlowDirectory} {os.path.join(dispersionDirectory, 'rootCase')}") - os.system(f"ln -s {dispersionFlowDirectory} {os.path.join(dispersionDirectory, 'rootCase')}") + rootcase_root = os.path.join(dispersionDirectory, 'rootCase') + if not os.path.exists(rootcase_root): + os.symlink(dispersionFlowDirectory, rootcase_root) # create the 0 directory in the root. logger.debug(f"Making the 0 in {os.path.join(dispersionDirectory, '0')}") @@ -1742,15 +1746,17 @@ def robustOpenFOAMFileValuesParser(path, columnNames): START_OF_FILE_VALUES = 18 FILE_SAMPLE_SIZE = 2048 - sed_command = ( - f"sed -e '1,{START_OF_FILE_VALUES}d' " - f"-e 's/[()]//g' " - f"-e 's/^[[:space:]]*//; s/[[:space:]]*$//' " - f"-e 's/[[:space:]]\\+/,/g' " - f"-e '/^$/d' " - f"-e '/\\/\\//d' " - f"{path}" - ) + sed_command = [ + "sed", + "-e", f"1,{START_OF_FILE_VALUES}d", + "-e", r"s/[()]//g", + "-e", r"s/^[[:space:]]*//", + "-e", r"s/[[:space:]]*$//", + "-e", r"s/[[:space:]]\+/,/g", + "-e", r"/^$/d", + "-e", r"/\/\//d", + path, + ] with open(path, 'r') as f: content = f.read(FILE_SAMPLE_SIZE) @@ -1761,12 +1767,12 @@ def robustOpenFOAMFileValuesParser(path, columnNames): # Clean up the value string (remove brackets if vector) val_str = uniform_match.group(2).replace('(', '').replace(')', '') single_val = numpy.array([float(x) for x in val_str.split()]) - + data = numpy.tile(single_val, (count, 1)) return pandas.DataFrame(data, columns=columnNames).astype(float) # failing in the process should give an unexpected error meaning we missed some case - proc = subprocess.run(sed_command, shell=True, capture_output=True, text=True, check=True) + proc = subprocess.run(sed_command, capture_output=True, text=True, check=True) if not proc.stdout.strip(): return pandas.DataFrame(columns=columnNames).astype(float) diff --git a/hera/simulations/openFoam/toolkit.py b/hera/simulations/openFoam/toolkit.py index e0d148870..3510ba89e 100644 --- a/hera/simulations/openFoam/toolkit.py +++ b/hera/simulations/openFoam/toolkit.py @@ -1,6 +1,7 @@ import numpy import os import glob +import subprocess import dask import pandas import shutil @@ -99,7 +100,7 @@ def runOFSimulation(self,nameOrWorkflowFileOrJSONOrResource, for doc in docList: logger.info(f"Executing {doc.desc['workflowName']}") os.chdir(doc.resource) - os.system("./Allrun") + subprocess.run(["./Allrun"], check=True) def prepareSlurmWorkflowExecution(self,baseConfiguration, jsonVariations, @@ -297,7 +298,11 @@ def getMesh(self, caseDirectory, readParallel=True, time=0): caseType = "decomposed" if useParallel else "composed" if not os.path.exists(checkPath): logger.debug(f"Cell centers does not exist in {caseType} case. Calculating...") - os.system(f"foamJob {parallelExec} {casePointer} -wait postProcess -func writeCellCentres -time {time}") + foam_cmd = ["foamJob"] + if parallelExec: + foam_cmd.append("-parallel") + foam_cmd.extend([str(casePointer), "-wait", "postProcess", "-func", "writeCellCentres", "-time", str(time)]) + subprocess.run(foam_cmd, check=False) logger.debug(f"done: foamJob {parallelExec} -wait postProcess -func writeCellCentres {casePointer} -time {time}") if not os.path.exists(checkPath): logger.error("Error running the writeCellCentres. Executing writeCellCentres failed. Are you sure that the openFOAM environment is set?"\ diff --git a/hera/simulations/windProfile/toolkit.py b/hera/simulations/windProfile/toolkit.py index 5af229475..5cc7559f0 100644 --- a/hera/simulations/windProfile/toolkit.py +++ b/hera/simulations/windProfile/toolkit.py @@ -146,7 +146,7 @@ def _getWindSpeedDirection(self,stations,IMS_TOKEN): data = json.loads(response.text.encode('utf8')) datetime_str = data['data'][0]['datetime'] break - except: + except Exception: trials += 1 # print(f"Trial {trials} for Station {station_id}") if data: diff --git a/hera/tests/conftest.py b/hera/tests/conftest.py index 319430274..4e0a158e1 100644 --- a/hera/tests/conftest.py +++ b/hera/tests/conftest.py @@ -24,6 +24,7 @@ Set to "1" to generate expected output files instead of comparing. """ +import glob import json import math import os @@ -49,6 +50,108 @@ PYTEST_PROJECT_NAME = "PYTEST_HERA_PROJECT" +# --------------------------------------------------------------------------- +# Cleanup primitives — leave ZERO traces (DB documents + on-disk directories) +# --------------------------------------------------------------------------- +# +# A Hera project "exists" (appears in ``getProjectList``) as long as ANY +# document carries its ``projectName`` — including the hidden +# ``__config__`` Cache document that ``Project`` creates on +# construction. Deleting only measurement/simulation documents therefore +# leaks the project. On disk, ``Project.__init__`` unconditionally creates +# ``~/.hera/`` (or whatever ``filesDirectory`` resolves to) and +# never removes it. These helpers tear down both halves completely. + +# Projects that must never be purged (shared / framework-owned). +_PROTECTED_PROJECTS = frozenset({"", "defaultProject"}) + +# Unambiguous project-name patterns owned by the test suite. Projects matching +# these are safe to purge even if they predate the session (they are leftovers +# from earlier, incompletely-cleaned runs). +_TEST_PROJECT_PREFIXES = ("pytest_", "unittest_project_dynamic_") +_TEST_PROJECT_EXACT = frozenset({"PYTEST_HERA_PROJECT", "REPOSITORY_PROJECT_TESTING_01"}) + + +def _is_test_project(projectName): + """True if a project name is unambiguously created by this test suite.""" + if projectName in _PROTECTED_PROJECTS: + return False + if projectName in _TEST_PROJECT_EXACT: + return True + return any(projectName.startswith(p) for p in _TEST_PROJECT_PREFIXES) + + +def purge_project_db(projectName): + """Delete every document of a project across all three collections. + + Uses the collection layer directly (NOT ``Project(...)``) so the purge has + no side effects — constructing a ``Project`` would re-create the + ``__config__`` document and its files directory. Removing the config + document is what actually makes the project disappear from + ``getProjectList``. + """ + if projectName in _PROTECTED_PROJECTS: + return + try: + from hera.datalayer.collection import AbstractCollection + # AbstractCollection (type=None) spans Measurements + Simulations + Cache. + AbstractCollection().deleteDocuments(projectName=projectName) + except Exception: + pass + + +def purge_project_dirs(projectName): + """Remove the on-disk directories a project may have created. + + Covers the default ``~/.hera/`` location and a stray + ``/`` directory (produced by historically relative + ``filesDirectory`` configurations). + """ + if projectName in _PROTECTED_PROJECTS: + return + candidates = [ + os.path.join(os.path.expanduser("~"), ".hera", projectName), + os.path.join(os.getcwd(), projectName), + ] + for d in candidates: + shutil.rmtree(d, ignore_errors=True) + + +def purge_project(projectName): + """Full teardown for a project: DB documents first, then on-disk directories. + + Order matters: the config document is deleted *before* the directories so + that no lingering ``filesDirectory`` config can resurrect a directory via a + later ``Project`` construction. + """ + purge_project_db(projectName) + purge_project_dirs(projectName) + + +def purge_test_disk_artifacts(): + """Remove on-disk directories left by the test suite, matched by name. + + Runs independently of DB state: a project may be cleaned from MongoDB by + its own fixture teardown yet still leave a directory behind (e.g. when an + old, relative ``filesDirectory`` config pointed it at the current working + directory). Also sweeps the temporary ``filesDirectory`` trees created by + the session/function fixtures. + """ + # Test-named project directories under ~/.hera and the working directory. + for root in (os.path.join(os.path.expanduser("~"), ".hera"), os.getcwd()): + if not os.path.isdir(root): + continue + for entry in os.listdir(root): + full = os.path.join(root, entry) + if os.path.isdir(full) and _is_test_project(entry): + shutil.rmtree(full, ignore_errors=True) + + # Temporary files-directory trees created by the test fixtures. + for pattern in ("hera_pytest_main_*", "hera_pytest_func_*", "hera_exp_test_*"): + for d in glob.glob(os.path.join(tempfile.gettempdir(), pattern)): + shutil.rmtree(d, ignore_errors=True) + + # --------------------------------------------------------------------------- # CLI option: --result-set # --------------------------------------------------------------------------- @@ -62,6 +165,54 @@ def pytest_addoption(parser): ) +# --------------------------------------------------------------------------- +# Session safety-net: guarantee zero leaked projects / directories +# --------------------------------------------------------------------------- + +@pytest.fixture(scope="session", autouse=True) +def _no_trace_guard(): + """Purge every project created during the test session. + + This is a defense-in-depth backstop: individual fixtures clean up after + themselves, but tests that create projects directly (or via CLI / notebook + subprocesses, e.g. ``unittest_project_dynamic_``) can still slip + through. We snapshot the project list at session start and, at the end, + purge anything that appeared during the session. + + Only *newly created* projects are touched — anything that already existed + before the session is left completely alone, so this can never disturb a + developer's pre-existing data. + """ + from hera.datalayer.project import getProjectList + + try: + before = set(getProjectList()) + except Exception: + before = set() + + yield + + try: + after = set(getProjectList()) + except Exception: + return + + # Purge (a) every project created during this session, plus (b) any project + # matching a known test-name pattern — the latter mops up leftovers from + # earlier incompletely-cleaned runs. Pre-existing projects that are NOT + # test artifacts are never touched. + new_projects = after - before + leftover_test_projects = {p for p in after if _is_test_project(p)} + for projectName in sorted(new_projects | leftover_test_projects): + if projectName in _PROTECTED_PROJECTS: + continue + purge_project(projectName) + + # Sweep disk artifacts whose owning project was already removed from the DB + # by its own fixture teardown (so the loop above never saw it). + purge_test_disk_artifacts() + + # --------------------------------------------------------------------------- # Session-scoped: test data root and configuration # --------------------------------------------------------------------------- @@ -130,12 +281,11 @@ def hera_test_project(test_hera_root): basedir = str(test_hera_root) - # Create the project, then forcibly redirect its files directory to /tmp - # so test runs never litter the repository root. + # Point the project's files directory at /tmp from the start so test runs + # never litter ``~/.hera`` or the repository root. Passing ``filesDirectory`` + # to the constructor means ``~/.hera/`` is never created. _files_tmp = tempfile.mkdtemp(prefix="hera_pytest_main_") - proj = Project(projectName=PYTEST_PROJECT_NAME) - proj._FilesDirectory = _files_tmp - proj.setConfig(filesDirectory=_files_tmp) + proj = Project(projectName=PYTEST_PROJECT_NAME, filesDirectory=_files_tmp) # Load all datasources + configs into the project dt = dataToolkit() @@ -148,12 +298,12 @@ def hera_test_project(test_hera_root): yield proj - # Teardown: remove all documents created during the session - try: - for doc in proj.getMeasurementsDocuments(): - doc.delete() - except Exception: - pass + # Teardown: delete ALL documents (incl. the __config__ doc) so the project + # disappears entirely, THEN remove the temporary files directory. Deleting + # the config first prevents any later Project() open from resurrecting the + # directory via its saved filesDirectory. + purge_project_db(PYTEST_PROJECT_NAME) + purge_project_dirs(PYTEST_PROJECT_NAME) shutil.rmtree(_files_tmp, ignore_errors=True) @@ -213,16 +363,11 @@ def project_fixture(): project_name = "pytest_temp_project" _files_tmp = tempfile.mkdtemp(prefix="hera_pytest_func_") - proj = Project(projectName=project_name) - proj._FilesDirectory = _files_tmp - proj.setConfig(filesDirectory=_files_tmp) + proj = Project(projectName=project_name, filesDirectory=_files_tmp) yield proj - # Cleanup: remove all documents created during the test - try: - for doc in proj.getMeasurementsDocuments(): - doc.delete() - except Exception: - pass + # Cleanup: delete ALL documents (incl. config) then the files directory. + purge_project_db(project_name) + purge_project_dirs(project_name) shutil.rmtree(_files_tmp, ignore_errors=True) @@ -379,11 +524,8 @@ def compare_outputs(result, expected, output_type): compare = funcs.get(output_type) if compare: - try: - ok = compare() - return bool(ok) - except Exception: - return False + ok = compare() + return bool(ok) return False diff --git a/hera/tests/dynamic_loading_tests_pack/conftest.py b/hera/tests/dynamic_loading_tests_pack/conftest.py index e2f018257..add558f49 100644 --- a/hera/tests/dynamic_loading_tests_pack/conftest.py +++ b/hera/tests/dynamic_loading_tests_pack/conftest.py @@ -154,6 +154,13 @@ def load_dummy_experiment_to_project(hera_repo_root, dummy_experiment_dir, temp_ "experiment_dir": str(dummy_experiment_dir), } - # Teardown: delete the project's files directory (~/.hera/) + # Teardown: remove ALL DB documents for the project (incl. the hidden + # ``__config__`` doc, so it no longer appears in getProjectList), + # then delete its files directory (~/.hera/). + try: + from hera.datalayer.collection import AbstractCollection + AbstractCollection().deleteDocuments(projectName=temp_project_name) + except Exception: + pass project_files_dir = os.path.join(os.path.expanduser("~"), ".hera", temp_project_name) shutil.rmtree(project_files_dir, ignore_errors=True) diff --git a/hera/tests/test_datalayer.py b/hera/tests/test_datalayer.py index 78f1d15cf..ded360c27 100644 --- a/hera/tests/test_datalayer.py +++ b/hera/tests/test_datalayer.py @@ -34,10 +34,21 @@ # --------------------------------------------------------------------------- def _mongo_is_available(): - """Return True if the configured MongoDB server is reachable.""" + """Return True if the configured MongoDB server is reachable. + + Uses a direct pymongo ping with a 1 s server-selection timeout so that + collection-time probing does not hang for 30 s when Mongo is down. + """ try: - p = Project(projectName="defaultProject") - list(p.getMeasurementsDocuments()) + import pymongo + from hera.datalayer.document import getMongoConfigFromJson + cfg = getMongoConfigFromJson() + host = cfg.get("dbIP", "localhost") + port = int(cfg.get("port", 27017)) + client = pymongo.MongoClient( + host=host, port=port, serverSelectionTimeoutMS=1000 + ) + client.server_info() return True except Exception: return False diff --git a/hera/tests/test_experiment.py b/hera/tests/test_experiment.py index 5284f1088..f84464fa3 100644 --- a/hera/tests/test_experiment.py +++ b/hera/tests/test_experiment.py @@ -31,10 +31,21 @@ # --------------------------------------------------------------------------- def _mongo_is_available(): + """Return True if the configured MongoDB server is reachable. + + Uses a direct pymongo ping with a 1 s server-selection timeout so that + collection-time probing does not hang for 30 s when Mongo is down. + """ try: - from hera.datalayer.project import Project - p = Project(projectName="defaultProject") - list(p.getMeasurementsDocuments()) + import pymongo + from hera.datalayer.document import getMongoConfigFromJson + cfg = getMongoConfigFromJson() + host = cfg.get("dbIP", "localhost") + port = int(cfg.get("port", 27017)) + client = pymongo.MongoClient( + host=host, port=port, serverSelectionTimeoutMS=1000 + ) + client.server_info() return True except Exception: return False diff --git a/hera/toolkit.py b/hera/toolkit.py index c1cb853ae..3c44b86e9 100644 --- a/hera/toolkit.py +++ b/hera/toolkit.py @@ -107,7 +107,7 @@ def projectName(self): return self._projectName def __init__(self, toolkitName: str, projectName: Optional[str] = None, - connectionName: Optional[str] = None, filesDirectory: Optional[str] = None): + connectionName: Optional[str] = None, filesDirectory: Optional[str] = None, **kwargs): """ Initialize a new toolkit. @@ -142,32 +142,38 @@ def classLoggerName(self): # Document overrides — automatically tag with toolkit name # ------------------------------------------------------------------ - def addCacheDocument(self, resource="", dataFormat="string", type="", desc={}): + def addCacheDocument(self, resource="", dataFormat="string", type="", desc=None): """ Add a cache document, automatically tagging it with the toolkit name. See ``Project.addCacheDocument`` for parameter details. """ + if desc is None: + desc = {} if self.toolkitName is not None: desc.setdefault(TOOLKIT_TOOLKITNAME_FIELD, self.toolkitName) return super().addCacheDocument(resource, dataFormat, type, desc) - def addMeasurementsDocument(self, resource="", dataFormat="string", type="", desc={}): + def addMeasurementsDocument(self, resource="", dataFormat="string", type="", desc=None): """ Add a measurements document, automatically tagging it with the toolkit name. See ``Project.addMeasurementsDocument`` for parameter details. """ + if desc is None: + desc = {} if self.toolkitName is not None: desc.setdefault(TOOLKIT_TOOLKITNAME_FIELD, self.toolkitName) return super().addMeasurementsDocument(resource, dataFormat, type, desc) - def addSimulationsDocument(self, resource="", dataFormat="string", type="", desc={}): + def addSimulationsDocument(self, resource="", dataFormat="string", type="", desc=None): """ Add a simulations document, automatically tagging it with the toolkit name. See ``Project.addSimulationsDocument`` for parameter details. """ + if desc is None: + desc = {} if self.toolkitName is not None: desc.setdefault(TOOLKIT_TOOLKITNAME_FIELD, self.toolkitName) return super().addSimulationsDocument(resource, dataFormat, type, desc) @@ -318,14 +324,6 @@ def getDataSourceDocument(self, datasourceName: Optional[str], version=None, **f docList = [doc for doc in docList if doc['desc']['version'] == latestVersion] ret = docList[0] - # No default was set and multiple versions exist — persist the - # latest version as the default so subsequent calls are stable. - if version is None and datasourceName is not None: - try: - self.setConfig(**{f"{datasourceName}_defaultVersion": latestVersion}) - except Exception: - pass - return ret def getToolkitDocument(self, toolkit_name: str): @@ -636,7 +634,7 @@ def __init__(self, projectName: Optional[str] = None, filesDirectory: Optional[s type="simulations", ), OF_LSM=dict( - cls="hera.simulations.openFoam.LSM.toolkit.OFLSMToolkit", + cls="hera.simulations.openFoam.lagrangian.LSM.toolkit.OFLSMToolkit", desc=None, type="simulations", ), @@ -801,13 +799,29 @@ def getToolkit(self, toolkitName: str, filesDirectory: Optional[str] = None, **k # ------------------------------------------------------------ # Add toolkit path to sys.path (highest priority) + # Validate before inserting: must be an existing directory and must + # not shadow well-known stdlib modules [1.6, 3.3] # ------------------------------------------------------------ - if toolkitPath in sys.path: + _toolkit_abs = os.path.abspath(toolkitPath) + if not os.path.isdir(_toolkit_abs): + raise ValueError( + f"Dynamic toolkit path does not exist: {_toolkit_abs!r}. " + "Only real directories may be added to sys.path." + ) + _stdlib_shadows = {"json", "logging", "os", "sys", "collections", "io", "re"} + for _mod in _stdlib_shadows: + if os.path.isfile(os.path.join(_toolkit_abs, f"{_mod}.py")): + raise ValueError( + f"Dynamic toolkit path {_toolkit_abs!r} contains {_mod}.py " + "which would shadow the standard library — loading aborted." + ) + if _toolkit_abs in sys.path: try: - sys.path.remove(toolkitPath) + sys.path.remove(_toolkit_abs) except ValueError: pass - sys.path.insert(0, toolkitPath) + sys.path.insert(0, _toolkit_abs) + toolkitPath = _toolkit_abs # self.logger.debug(f"Toolkit path (raw): {toolkitPath_raw}") # self.logger.debug(f"Toolkit path (resolved): {toolkitPath}") diff --git a/hera/utils/data/CLI.py b/hera/utils/data/CLI.py index e673ca32c..d254fc2a3 100644 --- a/hera/utils/data/CLI.py +++ b/hera/utils/data/CLI.py @@ -343,7 +343,7 @@ def display_datasource_versions(arguments): datasources.append(d) else: datasources.append(d) - except: + except Exception: pass else: config = proj.getConfig() @@ -365,7 +365,7 @@ def display_datasource_versions(arguments): d['DEFAULT_VERSION'] = default_version datasources.append(d) - except: + except Exception: pass if len(datasources) != 0: diff --git a/hera/utils/latex.py b/hera/utils/latex.py index 5c689d9d7..5b530f17a 100644 --- a/hera/utils/latex.py +++ b/hera/utils/latex.py @@ -220,10 +220,4 @@ def convert(self): return "\n".join([self._first_last_Lines[0]] + final + [self._first_last_Lines[1]]) -if __name__ == "__main__": - bb = bibtexFile("hebrewCrossTex/output.bbl") - print(bb.convert()) - - - diff --git a/hera/utils/query.py b/hera/utils/query.py index afd791ccb..88ada8fc7 100644 --- a/hera/utils/query.py +++ b/hera/utils/query.py @@ -1,4 +1,6 @@ -def andClause(excludeFields=[], **kwargs): +def andClause(excludeFields=None, **kwargs): + if excludeFields is None: + excludeFields = [] """ Builds a pandas query str Parameters diff --git a/hera/utils/unitHandler.py b/hera/utils/unitHandler.py index 4c040a10c..8e27ff11d 100644 --- a/hera/utils/unitHandler.py +++ b/hera/utils/unitHandler.py @@ -289,17 +289,14 @@ def unumToStr(obj): ret = str(obj) return ret - @deprecated(reason="Doesn't work for some cases") + @deprecated(reason="Cannot safely evaluate arbitrary strings; use ureg.Quantity directly") def strToUnum(value): """Convert a string to a Unum object.""" if isinstance(value, Unum): - ret = value - else: - try: - ret = eval(str(value)) - except: - ret = value - return ret + return value + # eval() removed: arbitrary code execution risk [1.5] + # Callers should migrate to ureg.Quantity(value) + return value def extractUnumUnitsFromPint(pint_quantity): """Extract unum unit equivalent from a pint Quantity.""" diff --git a/init_with_mongo.sh b/init_with_mongo.sh index 102bbbe49..b9beb324a 100755 --- a/init_with_mongo.sh +++ b/init_with_mongo.sh @@ -53,6 +53,8 @@ done # 6. Create config.json CONFIG_FILE="${PYHERA_DIR}/config.json" +MONGO_HERA_USER="${MONGO_HERA_USER:-hera}" +MONGO_HERA_PWD="${MONGO_HERA_PWD:-heracles}" if [ -f "${CONFIG_FILE}" ]; then echo "config.json already exists at ${CONFIG_FILE}, skipping creation." else @@ -62,8 +64,8 @@ else "${SYSTEM_USER}": { "dbIP": "127.0.0.1", "dbName": "olymp", - "password": "heracles", - "username": "hera" + "password": "${MONGO_HERA_PWD}", + "username": "${MONGO_HERA_USER}" } } EOF diff --git a/meta.yaml b/meta.yaml index 3130f8bcf..2a58eea6d 100644 --- a/meta.yaml +++ b/meta.yaml @@ -1,9 +1,9 @@ package: name: "hera" - version: "0.5.0" + version: "2.16.3" source: - git_url: http://mathsrv2:8081/edenn/pyhera.git + git_url: https://github.com/KaplanOpenSource/hera requirements: build: @@ -13,11 +13,19 @@ requirements: run: - python >=3.9 - - dask >=2.9 - - mongoengine >=0.18 - - xarray >=0.14 - - pytables >=3.4 - - scipy >=1.3 + - pandas >=1.3 + - numpy >=1.21 + - mongoengine >=0.24 + - pymongo >=3.12 + - pint >=0.19 + - deprecated >=1.2 + - scipy >=1.7 + - xarray >=0.20 + - dask >=2021.10 + - geopandas >=0.10 + - shapely >=1.8 about: - summary: "Hera" + summary: "Hera — Python scientific data management platform (GIS, meteorology, simulations, risk assessment)" + home: https://github.com/KaplanOpenSource/hera + license: MIT diff --git a/mongo-init.d/50-create-users.js b/mongo-init.d/50-create-users.js index fbf17e19b..69eca00d1 100644 --- a/mongo-init.d/50-create-users.js +++ b/mongo-init.d/50-create-users.js @@ -2,21 +2,25 @@ // use admin; db = db.getSiblingDB("admin"); -db.getUser("MathAdmin") || db.createUser( +var adminUser = process.env.MONGO_ADMIN_USER || "MathAdmin"; +var adminPwd = process.env.MONGO_ADMIN_PWD || "MathAdmin"; +var heraUser = process.env.MONGO_HERA_USER || "hera"; +var heraPwd = process.env.MONGO_HERA_PWD || "heracles"; + +db.getUser(adminUser) || db.createUser( { - user: "MathAdmin", - pwd: "MathAdmin", + user: adminUser, + pwd: adminPwd, roles: [ { role: "userAdminAnyDatabase", db: "admin" } , "readWriteAnyDatabase"] } ); // use admin; -db.getUser("hera") || db.createUser( +db.getUser(heraUser) || db.createUser( { - user: "hera", - pwd: "heracles", + user: heraUser, + pwd: heraPwd, roles: [ { role: "readWrite", db: "olymp" } ] - } ); diff --git a/setup.py b/setup.py index 12922fa22..dec68eb83 100644 --- a/setup.py +++ b/setup.py @@ -1,8 +1,19 @@ import glob +import re +from pathlib import Path from setuptools import setup, find_packages +# Read version from the package without importing it +_version_match = re.search( + r"^__version__\s*=\s*['\"]([^'\"]+)['\"]", + Path("hera/__init__.py").read_text(encoding="utf-8"), + re.M, +) +_VERSION = _version_match.group(1) if _version_match else "0.0.0" + setup( name="pyhera", + version=_VERSION, url="https://github.com/KaplanOpenSource/hera", packages=find_packages(), author="Yehuda Arav", @@ -16,6 +27,19 @@ "Operating System :: POSIX :: Linux", ], python_requires=">=3.9", + install_requires=[ + "pandas>=1.3", + "numpy>=1.21", + "mongoengine>=0.24", + "pymongo>=3.12", + "pint>=0.19", + "deprecated>=1.2", + "scipy>=1.7", + "xarray>=0.20", + "dask>=2021.10", + "geopandas>=0.10", + "shapely>=1.8", + ], scripts=[s for s in glob.glob("hera/bin/hera-*") if not s.endswith(".old")], extras_require={ "rag": [ diff --git a/ui/client/TEST_UI.md b/ui/client/TEST_UI.md index d3ee7ff6a..5679a9a29 100644 --- a/ui/client/TEST_UI.md +++ b/ui/client/TEST_UI.md @@ -1,23 +1,23 @@ # UI Validation Checklist -Run these steps **in order** from the repo root (`/home/eran/Code/hera`) to validate the UI client. +Run these steps **in order** from the repo root to validate the UI client. ## 1. TypeScript type-checking ```bash -cd /home/eran/Code/hera/ui/client && npx tsc --noEmit +cd ui/client && npx tsc --noEmit ``` ## 2. Unit tests ```bash -cd /home/eran/Code/hera/ui/client && npm run test +cd ui/client && npm run test ``` ## 3. Production build ```bash -cd /home/eran/Code/hera/ui/client && npm run build +cd ui/client && npm run build ``` ## 4. Clean build artifacts @@ -25,13 +25,13 @@ cd /home/eran/Code/hera/ui/client && npm run build **CRITICAL: Always run from repo root.** The build creates new hash-named files and modifies buildNumber.ts. Both must be reverted. ```bash -cd /home/eran/Code/hera && rm -f ui/client/bundle/assets/index-*.js ui/client/bundle/assets/index-*.css && git checkout -- ui/client/bundle/ ui/client/src/buildNumber.ts +rm -f ui/client/bundle/assets/index-*.js ui/client/bundle/assets/index-*.css && git checkout -- ui/client/bundle/ ui/client/src/buildNumber.ts ``` ## 5. Verify clean state ```bash -cd /home/eran/Code/hera && git status ui/client/bundle/ ui/client/src/buildNumber.ts +git status ui/client/bundle/ ui/client/src/buildNumber.ts ``` Must show "nothing to commit, working tree clean" with no untracked files.