Skip to content

fix: Fable5 scan report fixes Part 1 refs #953#955

Open
ilayfalach wants to merge 23 commits into
masterfrom
issue953
Open

fix: Fable5 scan report fixes Part 1 refs #953#955
ilayfalach wants to merge 23 commits into
masterfrom
issue953

Conversation

@ilayfalach

Copy link
Copy Markdown
Collaborator

Summary

Implements 32 of 51 findings from the Fable5 automated code-scan report. All Critical (🔴) and Major (🟠) items that could be fixed without a full architectural redesign are addressed. Full status tracked in SCAN_STATUS.md.

Security (all 7 items fixed)

  • Remove config.json from git tracking (live MongoDB credentials were committed) [1.1]
  • Mask password field before debug logging [1.2]
  • Parameterize bootstrap credentials via env vars (mongo-init.d, dockerfile, init_with_mongo.sh) [1.3]
  • Replace all os.system(f-string) with subprocess.run([list]), os.symlink, shutil.* across 5 simulation files [1.4]
  • Remove eval() on data-controlled strings (parsers.py dict counter; unitHandler.py deprecated) [1.5]
  • Validate sys.path before inserting DB-supplied paths (existence + stdlib-shadow check); replace sys.path.insert in class handler with importlib.util; annotate unavoidable pickle usage with # nosec B301 [1.6]
  • Convert remaining subprocess with shell=True to argument-list form [1.7]

Data Layer (7 of 13 fixed)

  • Fix mutable default desc={} / getDataParams={} / actionList=[] across 6 files [2.2]
  • Remove hardcoded absolute developer paths from shipped files (srtm_datasource.json, latex.py, ml.py) [2.3]
  • Fix getAllDocuments silently returning empty results (desc=desc**desc) [2.5]
  • Fix getCacheDcouments typo → getCacheDocuments [2.7]
  • Replace inline angle-math lambdas with hera.utils.toMeteorologicalAngle/toMathematicalAngle [2.9]
  • Replace raw EPSG integers with WSG84/ITM named constants across 4 files [2.10]

Architecture (6 of 11 fixed)

  • Correct broken OF_LSM registry cls path [3.1]
  • Remove hidden DB setConfig() write from getDataSourceDocument getter [3.4]
  • Add **kwargs to abstractToolkit.__init__ [3.6]
  • Remove circular import datalayer → toolkit [3.7]

Code Quality (3 of 6 fixed)

  • Replace bare except: with except Exception: in windProfile/toolkit.py and utils/data/CLI.py [4.2]

Testing & CI (3 of 6 fixed / 1 confirmed not an issue)

  • Item 5.1 confirmed not an issue (already fixed in Full test suite pass CICD on pull requests #884)
  • Replace slow Project-based MongoDB probe with pymongo.MongoClient(serverSelectionTimeoutMS=1000) [5.3]
  • Remove outer try/except from compare_outputs that was swallowing test failures [5.4]

Packaging & Hygiene (6 of 8 fixed)

  • Rewrite README to accurately describe Hera (was Django/GIS boilerplate) [6.1]
  • Read version= dynamically from hera/__init__.__version__ in setup.py [6.2]
  • Add 11 core runtime dependencies to setup.py install_requires [6.3]
  • Fix hardcoded /home/eran paths in ui/client/TEST_UI.md [6.4]
  • Untrack 31 MB of conda tarballs with git rm --cached; add pattern to .gitignore [6.5]
  • Update meta.yaml version, git URL, and dependencies [6.6]

Postponed items

Two Critical items are not in this PR due to architectural scope:

Full item-by-item status in SCAN_STATUS.md.

Test plan

  • pytest hera/tests/ passes (or skips cleanly when MongoDB is unavailable)
  • git ls-files config.json returns nothing
  • grep -rn "os\.system(" hera/simulations/ returns no live hits
  • grep -rn "getCacheDcouments" hera/ returns no hits
  • python -c "import hera; from hera import Project" succeeds without circular import error

🤖 Generated with Claude Code

Ilay Falach and others added 22 commits June 22, 2026 16:29
Tests were leaking MongoDB projects and on-disk directories:

- A project stays in getProjectList() as long as any document carries
  its name, including the hidden <projectName>__config__ Cache document
  created on every Project() construction. Teardowns deleted only
  measurement docs, so the config doc kept every project alive.
- Project.__init__ unconditionally creates ~/.hera/<projectName>; nothing
  removed it.
- The session fixture persisted an ephemeral mkdtemp path into the durable
  DB config, so later Project() opens resurrected the tmp dir via makedirs.

Fix:
- Add purge primitives that delete all docs (incl. the config doc, via the
  collection layer) and remove on-disk dirs.
- Add a session-scoped autouse safety-net that purges every project created
  during the session plus leftover test-named projects/dirs. Only touches
  test artifacts; defaultProject and pre-existing data are never modified.
- Pass filesDirectory at construction so ~/.hera/<name> is never created,
  and purge the config doc on teardown so tmp dirs cannot be resurrected.
- dynamic_loading conftest now purges the DB, not just the disk dir.

Verified: 275 passed; zero new DB/dir/tmp artifacts; idempotent across runs.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>


config.json contains personal MongoDB credentials committed to the public
repo. Untrack with git rm --cached; add to .gitignore to prevent future
accidental commits. Password rotation and git history scrub (git filter-repo)
are required follow-up steps by the repo maintainer.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Debug-level logging of the full mongoConfig dict exposes plaintext
passwords to log files. Replace with a filtered copy that omits 'password'.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
MONGO_ADMIN_USER/PWD and MONGO_HERA_USER/PWD env vars now control the
credentials provisioned by the init scripts and Dockerfile, with the old
hardcoded values as defaults. Production deployments must override these.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…link [1.4] refs #953

~16 os.system() call sites across 6 files replaced:
- Luigi execution: subprocess.run([list]) instead of f-string shell
- symlinks: os.symlink() instead of ln -s
- rm -r: shutil.rmtree()
- cp -rf *: glob + shutil.copytree/copy2
- foamJob: subprocess.run([list]) with flag handling
- sed pipeline: subprocess.run([list]) without shell=True
- ./Allrun, ./a.out: subprocess.run([list])

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…rToUnum [1.5, 4.1] refs #953

parsers.py: eval(f'count_{inst}') replaced with a plain dict counter;
eliminates arbitrary code execution from metadata instrument names.

unitHandler.py: eval(str(value)) removed from the deprecated strToUnum;
method now returns value unchanged with a migration note.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
… handler without sys.path mutation [1.6, 3.3] refs #953

toolkit.py: validate toolkitPath exists and doesn't shadow stdlib modules
before sys.path.insert(0, ...).

datahandler.py: Class getData() now uses importlib.util.spec_from_file_location
for resource-based fallback — sys.path is never mutated by DB-supplied paths.

project.py: pickle.dump/load annotated with # nosec B301 and a migration note;
full JSON replacement deferred to Part 2 to avoid breaking the export format.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
#953

Replace all mutable default-argument literals (`desc={}`, `getDataParams={}`,
`actionList=[]`, `excludeFields=[]`) with `None` + guard idiom across:
- hera/toolkit.py (addCacheDocument, addMeasurementsDocument, addSimulationsDocument)
- hera/datalayer/project.py (same three methods)
- hera/datalayer/datahandler.py (all 15 getData signatures + two body guards)
- hera/datalayer/autocache.py (cacheFunction + cacheDecorators.__init__)
- hera/utils/query.py (andClause)
- hera/riskassessment/protectionpolicy/ProtectionPolicy.py (__init__)

Prevents shared-state bugs where callers mutate the default dict/list across calls.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- srtm_datasource.json: add isRelativePath=True and convert absolute path
  to repo-relative path (hera/tests/UNIT_TEST_GIS_RASTER_TOPOGRAPHY/)
- latex.py: remove __main__ block containing /home/yehudaa/... path
- ml.py: remove __main__ block containing /ibdata2/nirb/... paths

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…2.7] refs #953

- project.py getAllDocuments: pass **desc instead of desc=desc so filter
  kwargs are correctly forwarded to each collection's getDocuments query
- topography.py:298: fix getCacheDcouments → getCacheDocuments (typo)
  and spread the desc dict as **kwargs to match the correct signature

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…refs #953

- riskAreas.py: remove duplicate toMeteorologicalAngle/toMathematicalAngle
  lambdas; import the canonical implementations from hera.utils instead
- turbulencestatistics.py: replace two apply(lambda x: 270-x ...) calls
  with apply(toMeteorologicalAngle) using the same import

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…refs #953

- wrfDatalayer.py: import WSG84/ITM from hera.measurements.GIS; replace
  raw 2039 and 4326 literals (geo.crs=2039, to_crs(epsg=4326), etc.)
- thresholdGeoDataFrame.py: import ITM; replace {"init":"epsg:2039"} dict
  (old geopandas syntax) with ITM integer constant
- buildings/analysis.py: import ITM; replace crs=2039 in __main__ block
- topography.py: import ITM; replace desc.update({'crs': 2039})

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…#953

- toolkit.py OF_LSM registry: fix cls path from
  hera.simulations.openFoam.LSM.toolkit to
  hera.simulations.openFoam.lagrangian.LSM.toolkit.OFLSMToolkit
- getDataSourceDocument: remove setConfig() side-effect; a read-only
  method must not write to the DB — callers who need a persisted default
  should call setDataSourceDefaultVersion() explicitly

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
….6, 3.7] refs #953

- abstractToolkit.__init__: add **kwargs so subclass constructors can pass
  extra keyword arguments without triggering TypeError on unexpected params
- project.py: remove 'from hera import toolkit' circular import; the name
  was only referenced in docstring strings, never in actual code

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Bare except: catches KeyboardInterrupt, SystemExit, and GeneratorExit —
signals that should propagate. Narrowed to except Exception: in the two
affected locations:
- windProfile/toolkit.py: IMS API retry loop
- utils/data/CLI.py: datasource listing (two occurrences)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…refs #953 refs #943

5.3: Replace slow Project-based module-level MongoDB probe with a direct
pymongo.MongoClient ping (serverSelectionTimeoutMS=1000) in both
test_datalayer.py and test_experiment.py. The old probe triggered
pymongo's 30 s server-selection timeout per test module during collection
when MongoDB was down, freezing CI. The fast probe returns in ≤1 s.

5.4: Remove outer try-except in compare_outputs that swallowed all
comparison crashes and silently returned False. Comparison errors now
propagate as test errors instead of silent mismatches.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Replace AI-generated boilerplate that described Hera as a 'Django-based
GIS web system' (completely wrong) with an accurate description of the
actual stack: Python 3, MongoDB/mongoengine, pandas/dask/geopandas/xarray,
domain-specific toolkits (GIS, meteorology, simulations, risk assessment).
Remove the 'Live Demo' link that pointed to the generic Kaplan org page.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…] refs #953

6.2: Read version dynamically from hera/__init__.__version__ using regex
so pip reports the correct version (2.16.3) instead of 0.0.0.

6.3: Add install_requires with the 11 core runtime dependencies that every
hera import path needs (pandas, numpy, mongoengine, pymongo, pint,
deprecated, scipy, xarray, dask, geopandas, shapely). This lets
'pip install pyhera' work for end-users without a separate
requirements.txt step.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
… 6.5, 6.6] refs #953

6.4: TEST_UI.md: replace all 6 occurrences of '/home/eran/Code/hera'
with relative paths so every developer's machine works correctly.

6.5: Untrack 3 conda tarballs from git (gdal, icu, nodejs for Python 3.6,
~31 MB). Add *.tar.bz2 and *.conda under 'additional installations/'
to .gitignore to prevent re-adding. Files kept on disk.

6.6: meta.yaml: bump version to 2.16.3, replace dead internal git URL
with public GitHub URL, expand run dependencies to match setup.py
install_requires, remove pytables (absent from requirements.txt).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…#953

Documents every finding from the 2026-06-11 Fable5 code review with one
of three statuses: Fixed (32 items), Not an Issue (1 item), or Postponed
(18 items). Provides commit references for all fixes and rationale for
postponed items.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…port refs #953

- hermesWorkflowToolkit.py (main + LSM): combine master's
  buildLuigiExecutionCommand (scheduler/dispatch_id support) with
  our subprocess.run([shlex.split(...)]) security fix instead of os.system
- latex.py: keep our __main__ removal (hardcoded dev path) over master's revert

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…topography [2.10] refs #953

buildings/analysis.py: from .... import ITM → from ...utils import ITM
  (4 dots reached hera.measurements which does not export ITM;
   3 dots + .utils reaches hera.measurements.GIS.utils which does)

topography.py: from ... import ITM → from ..utils import ITM
  (same root cause, one fewer level up from GIS/vector/)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
… [1.6, 3.3] refs #953

When resource points to the top-level package dir (e.g. /path/mypkg)
and classpath is mypkg.impl.MyClass, the old code built:
  /path/mypkg/mypkg/impl.py   ← double package name, file not found

Fix: strip the leading package component from the file path when
os.path.basename(resource) matches the first dotted component of module_name.
Tested by test_dynamic_class_loading_via_getdata.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant