Skip to content

Conversation

@sbryngelson
Copy link
Member

@sbryngelson sbryngelson commented Dec 18, 2025

PR Type

Enhancement, Bug fix, Tests, Documentation


Description

  • Added trained neural network models: Introduced two production-ready TBNN models (tbnn_channel_caseholdout and tbnn_phll_caseholdout) with complete weight matrices, biases, and normalization parameters for channel flow and periodic hills cases

  • Removed legacy example models: Cleaned up deprecated example_tbnn and example_scalar_nut models and their associated weight files from the repository

  • Enhanced configuration validation: Implemented explicit NN model specification requirements in config.cpp with improved error messages and automatic path mirroring, removing legacy fallback to default data/ directory

  • Fixed GEP parameter order bug: Corrected parameter order in set_reference() calls in turbulence_gep.cpp from (nu_, u_ref_, delta_) to (nu_, delta_, u_ref_)

  • Expanded test coverage: Added comprehensive test suites for feature computation (test_features.cpp), turbulence model validation (test_turbulence_features.cpp), and generalized backend execution tests supporting both CPU and GPU

  • Improved backend flexibility: Refactored GPU-specific tests to support graceful CPU fallback when GPU is unavailable, with clear backend status reporting

  • Updated all tests to use trained models: Migrated test suite from legacy example models to trained tbnn_channel_caseholdout weights across integration, backend, and perturbed channel tests

  • Enhanced documentation: Created comprehensive usage guides and metadata files for trained models, including architecture details, training configuration, feature descriptions, and usage examples

  • Improved dataset handling: Enhanced McConkey dataset download robustness with better Kaggle CLI invocation and fallback unzip mechanisms; added local training script for TBNN/MLP models


Diagram Walkthrough

flowchart LR
  A["Legacy Models<br/>example_tbnn<br/>example_scalar_nut"] -->|"Removed"| B["Trained Models<br/>tbnn_channel_caseholdout<br/>tbnn_phll_caseholdout"]
  B -->|"Used in"| C["Updated Tests<br/>test_features<br/>test_turbulence_features<br/>test_backend_execution"]
  D["Config Validation<br/>Explicit Model Spec"] -->|"Enforces"| B
  E["GEP Bug Fix<br/>Parameter Order"] -->|"Corrects"| F["Feature Computation"]
  C -->|"Validates"| F
  G["Enhanced Documentation<br/>USAGE.md<br/>metadata.json"] -->|"Describes"| B
Loading

File Walkthrough

Relevant files
Tests
7 files
test_features.cpp
Expand feature tests with analytic velocity field verification

tests/test_features.cpp

  • Replaced generic velocity gradient tests with analytic verification
    tests for pure shear, pure strain, and solid body rotation flows
  • Added comprehensive feature computation tests including scalar
    features, TBNN features, and tensor basis operations
  • Implemented anisotropy tensor construction and Reynolds stress
    conversion tests
  • Added GPU/CPU gradient computation backend tests with conditional
    compilation
+413/-49
test_turbulence_features.cpp
Add comprehensive turbulence model feature tests                 

tests/test_turbulence_features.cpp

  • New test file exercising turbulence model computation paths with
    nontrivial velocity gradients
  • Tests EARSM Re_t-based blending, baseline model response, GEP model,
    and multiple EARSM variants (Wallin-Johansson, Gatski-Speziale, Pope)
  • Includes batch feature computation tests, realizability constraint
    verification, and solver backend execution
  • Supports both CPU and GPU backends with conditional compilation
+539/-0 
test_cpu_gpu_consistency.cpp
Improve GPU/CPU consistency tests with fallback paths       

tests/test_cpu_gpu_consistency.cpp

  • Refactored to gracefully handle GPU unavailability in GPU-enabled
    builds
  • Changed from skip-on-no-GPU to CPU-fallback behavior for consistency
    tests
  • Updated test titles and output messages to reflect CPU/GPU backend
    status
  • Improved model loading to use trained tbnn_channel_caseholdout instead
    of legacy example models
+132/-85
test_backend_execution.cpp
Generalize backend execution tests for CPU and GPU             

tests/test_backend_execution.cpp

  • Renamed from GPU-specific to backend-agnostic execution tests
  • Added CPU path support for all tests with conditional compilation
  • Updated model loading to use trained TBNN weights
    (tbnn_channel_caseholdout)
  • Improved test output to clearly indicate which backend (CPU/GPU) is
    being tested
+128/-131
test_nn_integration.cpp
Update NN integration tests to use trained models               

tests/test_nn_integration.cpp

  • Updated model paths from legacy example_scalar_nut and example_tbnn to
    trained tbnn_channel_caseholdout
  • Simplified model loading to use consistent trained weights across all
    NN integration tests
+10/-9   
test_perturbed_channel.cpp
Update perturbed channel test with trained model paths     

tests/test_perturbed_channel.cpp

  • Updated turbulence model creation to use trained
    tbnn_channel_caseholdout model
  • Simplified model path logic to select appropriate trained weights
    based on model type
+5/-1     
CMakeLists.txt
Test infrastructure updates and new turbulence features test

CMakeLists.txt

  • Renamed test_gpu_execution to test_backend_execution for
    backend-agnostic testing
  • Renamed test case GPUExecutionTest to BackendExecutionTest
  • Renamed test case CPUGPUConsistencyTest to ConsistencyTest
  • Added new test executable test_turbulence_features for analytic
    validation of features, invariants, and model response
+9/-4     
Configuration changes
2 files
config.cpp
Add NN model configuration validation and error handling 

src/config.cpp

  • Added validation requiring explicit NN model specification (preset or
    weights/scaling paths)
  • Improved error messages listing available presets and configuration
    options
  • Implemented automatic path mirroring when only weights or scaling is
    provided
  • Removed legacy fallback to default data/ directory
+31/-9   
config.hpp
Update config header with explicit NN model requirements 

include/config.hpp

  • Changed NN model path defaults from data/ to empty strings (require
    explicit specification)
  • Updated documentation to indicate preset model names (e.g.,
    tbnn_channel_caseholdout)
  • Clarified that no legacy fallback paths are available
+4/-4     
Bug fix
1 files
turbulence_gep.cpp
Fix parameter order in GEP feature computer initialization

src/turbulence_gep.cpp

  • Fixed parameter order in set_reference() calls from (nu_, u_ref_,
    delta_) to (nu_, delta_, u_ref_)
  • Applied fix in both ensure_initialized() and update() methods
+2/-2     
Enhancement
17 files
train_realdata_local.sh
Add local training script for real McConkey dataset           

scripts/train_realdata_local.sh

  • New comprehensive training script for local TBNN/MLP training on
    McConkey dataset
  • Handles venv creation in scratch, dataset download via Kaggle API, and
    model export
  • Supports configurable training parameters (epochs, batch size,
    learning rate, device)
  • Includes dataset validation and helpful error messages for missing
    credentials
+259/-0 
download_mcconkey_data.sh
Improve McConkey dataset download robustness                         

scripts/download_mcconkey_data.sh

  • Improved Kaggle CLI invocation to handle versions without __main__
    support
  • Added explicit zip file handling with fallback to pure-Python unzip
  • Removed redundant kaggle CLI availability check
  • Enhanced error messages for missing credentials
+18/-19 
layer0_W.txt
TBNN channel flow model input layer weights                           

data/models/tbnn_channel_caseholdout/layer0_W.txt

  • Added new neural network weight matrix file for TBNN channel flow
    model
  • Contains 64 rows of 5-dimensional weight vectors (input layer: 5 → 64
    neurons)
  • Weights stored in scientific notation format for numerical precision
+64/-0   
layer3_W.txt
TBNN periodic hills model output layer weights                     

data/models/tbnn_phll_caseholdout/layer3_W.txt

  • Added output layer weight matrix for TBNN periodic hills model
  • Contains 4 rows of 64-dimensional vectors (output layer: 64 → 4
    coefficients)
  • Represents final transformation to tensor basis coefficients
+4/-0     
layer3_W.txt
TBNN channel flow model output layer weights                         

data/models/tbnn_channel_caseholdout/layer3_W.txt

  • Added output layer weight matrix for TBNN channel flow model
  • Contains 4 rows of 64-dimensional vectors (output layer: 64 → 4
    coefficients)
  • Represents final transformation to tensor basis coefficients
+4/-0     
layer0_b.txt
TBNN periodic hills model input layer biases                         

data/models/tbnn_phll_caseholdout/layer0_b.txt

  • Added bias vector for input layer of periodic hills TBNN model
  • Contains 64 bias values (one per neuron in first hidden layer)
  • Stored in scientific notation for numerical precision
+64/-0   
layer1_b.txt
TBNN periodic hills model first hidden layer biases           

data/models/tbnn_phll_caseholdout/layer1_b.txt

  • Added bias vector for first hidden layer of periodic hills TBNN model
  • Contains 64 bias values for hidden layer neurons
  • Stored in scientific notation format
+64/-0   
layer2_b.txt
TBNN periodic hills model second hidden layer biases         

data/models/tbnn_phll_caseholdout/layer2_b.txt

  • Added bias vector for second hidden layer of periodic hills TBNN model
  • Contains 64 bias values for hidden layer neurons
  • Stored in scientific notation format
+64/-0   
layer0_b.txt
TBNN channel flow model input layer biases                             

data/models/tbnn_channel_caseholdout/layer0_b.txt

  • Added bias vector for input layer of channel flow TBNN model
  • Contains 64 bias values (one per neuron in first hidden layer)
  • Stored in scientific notation for numerical precision
+64/-0   
layer1_b.txt
TBNN channel flow model first hidden layer biases               

data/models/tbnn_channel_caseholdout/layer1_b.txt

  • Added bias vector for first hidden layer of channel flow TBNN model
  • Contains 64 bias values for hidden layer neurons
  • Stored in scientific notation format
+64/-0   
layer2_b.txt
TBNN channel flow model second hidden layer biases             

data/models/tbnn_channel_caseholdout/layer2_b.txt

  • Added bias vector for second hidden layer of channel flow TBNN model
  • Contains 64 bias values for hidden layer neurons
  • Stored in scientific notation format
+64/-0   
input_means.txt
Input normalization means for periodic hills TBNN model   

data/models/tbnn_phll_caseholdout/input_means.txt

  • Added input feature normalization means for periodic hills TBNN model
  • Contains 5 mean values for z-score normalization of input features
  • Stored in scientific notation for numerical precision
+5/-0     
input_stds.txt
Input normalization standard deviations for periodic hills TBNN model

data/models/tbnn_phll_caseholdout/input_stds.txt

  • Added input feature normalization standard deviations for periodic
    hills TBNN model
  • Contains 5 standard deviation values for z-score normalization
  • Stored in scientific notation format
+5/-0     
input_means.txt
Input normalization means for channel flow TBNN model       

data/models/tbnn_channel_caseholdout/input_means.txt

  • Added input feature normalization means for channel flow TBNN model
  • Contains 5 mean values for z-score normalization of input features
  • Stored in scientific notation for numerical precision
+5/-0     
input_stds.txt
Input normalization standard deviations for channel flow TBNN model

data/models/tbnn_channel_caseholdout/input_stds.txt

  • Added input feature normalization standard deviations for channel flow
    TBNN model
  • Contains 5 standard deviation values for z-score normalization
  • Stored in scientific notation format
+5/-0     
layer3_b.txt
TBNN periodic hills model output layer biases                       

data/models/tbnn_phll_caseholdout/layer3_b.txt

  • Added bias vector for output layer of periodic hills TBNN model
  • Contains 4 bias values (one per output coefficient)
  • Stored in scientific notation format
+4/-0     
layer3_b.txt
TBNN channel flow model output layer biases                           

data/models/tbnn_channel_caseholdout/layer3_b.txt

  • Added bias vector for output layer of channel flow TBNN model
  • Contains 4 bias values (one per output coefficient)
  • Stored in scientific notation format
+4/-0     
Dependencies
1 files
layer0_W.txt
Add trained TBNN weights for periodic hills case                 

data/models/tbnn_phll_caseholdout/layer0_W.txt

  • New trained weight matrix for periodic hills TBNN model (64 hidden
    units, 5 input features)
  • Contains 64×5 matrix of floating-point weights
+64/-0   
Documentation
6 files
USAGE.md
Add usage guide for trained channel TBNN model                     

data/models/tbnn_channel_caseholdout/USAGE.md

  • Comprehensive documentation for trained channel flow TBNN model
  • Includes architecture details, training configuration, feature
    descriptions, and usage examples
  • Provides comparison with periodic hills variant and references to
    original papers
  • Documents file structure and training metadata
+134/-0 
README.md
Comprehensive documentation for trained neural network models

data/models/README.md

  • Completely restructured documentation to focus on trained models
    instead of legacy examples
  • Added detailed sections for two trained TBNN models:
    tbnn_channel_caseholdout and tbnn_phll_caseholdout
  • Included training details, architecture specifications, validation
    losses, and usage examples
  • Added model selection guide table comparing performance metrics
  • Removed legacy weight export instructions and replaced with
    model-specific usage guidance
+83/-12 
USAGE.md
Detailed usage guide for periodic hills TBNN model             

data/models/tbnn_phll_caseholdout/USAGE.md

  • Created comprehensive usage guide for periodic hills TBNN model
  • Documented case-holdout training split (train: case_0p5/0p8/1p5, val:
    case_1p0, test: case_1p2)
  • Detailed feature definitions (5 Ling et al. 2016 invariants) and
    output tensor basis
  • Provided command-line usage examples and configuration file templates
  • Included training hyperparameters, performance metrics, and file
    structure reference
+119/-0 
README.md
Simplified data directory documentation with model focus 

data/README.md

  • Simplified root data directory documentation to redirect to
    data/models/README.md
  • Removed legacy weight export instructions for PyTorch and TensorFlow
  • Removed detailed feature ordering documentation
  • Added quick-start examples for using trained models with --nn_preset
    flag
  • Clarified that NN models now require explicit selection and no
    defaults exist
+21/-93 
metadata.json
Metadata documentation for periodic hills TBNN model         

data/models/tbnn_phll_caseholdout/metadata.json

  • Added metadata file documenting TBNN periodic hills model architecture
    and training
  • Specifies 5-64-64-64-4 layer configuration with tanh/linear
    activations
  • Documents 5 Ling et al. 2016 invariant features with z-score
    normalization
  • Includes training framework (PyTorch), optimizer (Adam), and dataset
    reference
+53/-0   
metadata.json
Metadata documentation for channel flow TBNN model             

data/models/tbnn_channel_caseholdout/metadata.json

  • Added metadata file documenting TBNN channel flow model architecture
    and training
  • Specifies 5-64-64-64-4 layer configuration with tanh/linear
    activations
  • Documents 5 Ling et al. 2016 invariant features with z-score
    normalization
  • Includes training framework (PyTorch), optimizer (Adam), and dataset
    reference
+53/-0   
Additional files
34 files
input_means.txt +0/-5     
input_stds.txt +0/-5     
layer0_W.txt +0/-64   
layer0_b.txt +0/-64   
layer1_W.txt +0/-64   
layer1_b.txt +0/-64   
layer2_W.txt +0/-64   
layer2_b.txt +0/-64   
layer3_W.txt +0/-4     
layer3_b.txt +0/-4     
input_means.txt +0/-6     
input_stds.txt +0/-6     
layer0_W.txt +0/-32   
layer0_b.txt +0/-32   
layer1_W.txt +0/-32   
layer1_b.txt +0/-32   
layer2_W.txt +0/-1     
layer2_b.txt +0/-1     
metadata.json +0/-52   
input_means.txt +0/-5     
input_stds.txt +0/-5     
layer0_W.txt +0/-64   
layer0_b.txt +0/-64   
layer1_W.txt +0/-64   
layer1_b.txt +0/-64   
layer2_W.txt +0/-64   
layer2_b.txt +0/-64   
layer3_W.txt +0/-4     
layer3_b.txt +0/-4     
metadata.json +0/-66   
layer1_W.txt +64/-0   
layer2_W.txt +64/-0   
layer1_W.txt +64/-0   
layer2_W.txt +64/-0   

- Add trained TBNN models for channel and periodic hills flows
  - data/models/tbnn_channel_caseholdout/ (trained on McConkey channel data)
  - data/models/tbnn_phll_caseholdout/ (trained on McConkey periodic hills data)
  - Both models include weights, biases, normalization, and metadata

- Remove all example/dummy weights
  - Deleted legacy root-level layer files from data/
  - Removed example_scalar_nut/ and example_tbnn/ placeholder models
  - No more random/untrained weights in codebase

- Update model loading to require explicit selection
  - Modified Config to require --nn_preset or --weights/--scaling
  - Updated default paths from 'data/' to empty strings
  - Added validation in Config::finalize() for NN models

- Update all GPU CI tests to use trained models
  - test_nn_integration.cpp: Use tbnn_channel_caseholdout
  - test_gpu_execution.cpp: Use tbnn_channel_caseholdout
  - test_cpu_gpu_consistency.cpp: Use tbnn_channel_caseholdout
  - test_perturbed_channel.cpp: Dynamically select trained models

- Update documentation
  - Rewrote data/README.md to direct users to data/models/
  - Updated data/models/README.md with current model structure
  - Added USAGE.md files for each trained model

- Update .gitignore to track trained models
  - Keep *_caseholdout/ directories in version control
  - Ignore transient training outputs (job-ID suffixes)

- Add training infrastructure
  - scripts/train_realdata_local.sh for local training
  - Updated download_mcconkey_data.sh to use python -m kaggle

All tests pass on GPU hardware (V100) with trained weights.
@qodo-code-review
Copy link

qodo-code-review bot commented Dec 18, 2025

PR Compliance Guide 🔍

Below is a summary of compliance checks for this PR:

Security Compliance
Zip Slip extraction

Description: The script downloads an external zip and extracts it with unzip -q into ${OUTPUT_DIR}
without validating archive entry paths, which can enable a Zip Slip path traversal (e.g.,
../ or absolute paths inside the zip) if the downloaded dataset zip is malicious or
tampered with.
download_mcconkey_data.sh [40-57]

Referred Code
# Download dataset using Kaggle CLI
# Note: some Kaggle versions do NOT support `python -m kaggle` (no __main__).
export KAGGLE_CONFIG_DIR="${HOME}/.kaggle"

ZIP_DIR="${OUTPUT_DIR}/_kaggle_zip"
mkdir -p "${ZIP_DIR}"

kaggle datasets download -d "$DATASET_NAME" -p "$ZIP_DIR"

ZIP_FILE="$(ls -1t "${ZIP_DIR}"/*.zip 2>/dev/null | head -n 1 || true)"
if [[ -z "${ZIP_FILE}" ]]; then
    echo "ERROR: Kaggle download completed but no .zip was found in ${ZIP_DIR}" >&2
    exit 1
fi

echo "Unzipping ${ZIP_FILE} -> ${OUTPUT_DIR}"
unzip -q "${ZIP_FILE}" -d "${OUTPUT_DIR}"
Sensitive path disclosure

Description: The documentation includes a contributor-specific absolute filesystem path (e.g.,
/storage/home/...) that may reveal internal environment/user information when the
repository is shared publicly.
USAGE.md [125-132]

Referred Code
## Training Location

This model was trained in:
- Workspace: `/storage/home/hcoda1/6/sbryngelson3/scratch/nn-cfd`
- Training script: `scripts/run_channel_train.sbatch`
- Job ID: 2998847
- Dataset generator: `generate_channel_npz.py`
Ticket Compliance
🎫 No ticket provided
  • Create ticket/issue
Codebase Duplication Compliance
Codebase context is not defined

Follow the guide to enable codebase context checks.

Custom Compliance
🟢
Generic: Comprehensive Audit Trails

Objective: To create a detailed and reliable record of critical system actions for security analysis
and compliance.

Status: Passed

Learn more about managing compliance generic rules or creating your own custom rules

Generic: Meaningful Naming and Self-Documenting Code

Objective: Ensure all identifiers clearly express their purpose and intent, making code
self-documenting

Status: Passed

Learn more about managing compliance generic rules or creating your own custom rules

Generic: Secure Error Handling

Objective: To prevent the leakage of sensitive system information through error messages while
providing sufficient detail for internal debugging.

Status: Passed

Learn more about managing compliance generic rules or creating your own custom rules

Generic: Secure Logging Practices

Objective: To ensure logs are useful for debugging and auditing without exposing sensitive
information like PII, PHI, or cardholder data.

Status: Passed

Learn more about managing compliance generic rules or creating your own custom rules

Generic: Security-First Input Validation and Data Handling

Objective: Ensure all data inputs are validated, sanitized, and handled securely to prevent
vulnerabilities

Status: Passed

Learn more about managing compliance generic rules or creating your own custom rules

🔴
Generic: Robust Error Handling and Edge Case Management

Objective: Ensure comprehensive error handling that provides meaningful context and graceful
degradation

Status:
Missing CLI validation: The script invokes kaggle and unzip without validating the commands exist or checking
their exit status, which can cause confusing failures instead of actionable error
handling.

Referred Code
# Download dataset using Kaggle CLI
# Note: some Kaggle versions do NOT support `python -m kaggle` (no __main__).
export KAGGLE_CONFIG_DIR="${HOME}/.kaggle"

ZIP_DIR="${OUTPUT_DIR}/_kaggle_zip"
mkdir -p "${ZIP_DIR}"

kaggle datasets download -d "$DATASET_NAME" -p "$ZIP_DIR"

ZIP_FILE="$(ls -1t "${ZIP_DIR}"/*.zip 2>/dev/null | head -n 1 || true)"
if [[ -z "${ZIP_FILE}" ]]; then
    echo "ERROR: Kaggle download completed but no .zip was found in ${ZIP_DIR}" >&2
    exit 1
fi

echo "Unzipping ${ZIP_FILE} -> ${OUTPUT_DIR}"
unzip -q "${ZIP_FILE}" -d "${OUTPUT_DIR}"

Learn more about managing compliance generic rules or creating your own custom rules

  • Update
Compliance status legend 🟢 - Fully Compliant
🟡 - Partial Compliant
🔴 - Not Compliant
⚪ - Requires Further Human Verification
🏷️ - Compliance label

@qodo-code-review
Copy link

qodo-code-review bot commented Dec 18, 2025

PR Code Suggestions ✨

Latest suggestions up to 282ad8d

CategorySuggestion                                                                                                                                    Impact
Security
Harden path traversal validation

Harden the path validation in safe_extract.py to prevent bypasses on Windows.
Normalize path separators, explicitly check for Windows absolute paths, and
reject paths containing NUL bytes.

scripts/safe_extract.py [53-73]

 for info in z.infolist():
     name = info.filename
 
-    # Check for absolute paths
-    if name.startswith(("/", "\\")):
+    # Reject NUL bytes (can confuse path handling)
+    if "\x00" in name:
+        unsafe_entries.append(f"NUL byte in path: {name!r}")
+        continue
+
+    # Normalize separators so traversal checks can't be bypassed with backslashes
+    norm_name = name.replace("\\", "/")
+
+    # Reject absolute paths and Windows drive/UNC paths
+    if norm_name.startswith("/"):
         unsafe_entries.append(f"Absolute path: {name}")
+        continue
+    if pathlib.PureWindowsPath(norm_name).is_absolute() or pathlib.PureWindowsPath(norm_name).drive:
+        unsafe_entries.append(f"Absolute/drive path: {name}")
         continue
 
     # Check for parent directory traversal
-    parts = pathlib.PurePosixPath(name).parts
+    parts = pathlib.PurePosixPath(norm_name).parts
     if ".." in parts:
         unsafe_entries.append(f"Parent traversal: {name}")
         continue
 
     # Verify resolved path is within output directory
-    dest = (out_dir / name).resolve()
+    dest = (out_dir / norm_name).resolve()
     try:
         dest.relative_to(out_dir)
     except ValueError:
         unsafe_entries.append(f"Escapes output dir: {name}")
         continue

[To ensure code accuracy, apply this suggestion manually]

Suggestion importance[1-10]: 10

__

Why: This suggestion identifies a critical security flaw in the new safe_extract function, where the path validation can be bypassed on Windows, defeating the purpose of the Zip Slip protection.

High
Block symlink-based zip slip

Prevent a symlink-based Zip Slip vulnerability by adding a check to reject ZIP
archives that contain symbolic links before calling z.extractall(out_dir).

scripts/safe_extract.py [80-85]

 # All entries validated - safe to extract
 if verbose:
     print(f"Validated {len(z.infolist())} entries - all safe")
     print("Extracting...")
 
+# Block symlink entries to prevent "symlink zip slip" (write-through symlink)
+for info in z.infolist():
+    is_symlink = (info.external_attr >> 16) & 0o170000 == 0o120000
+    if is_symlink:
+        raise RuntimeError(f"Unsafe ZIP archive detected:\n  - Symlink entry: {info.filename}")
+
 z.extractall(out_dir)

[To ensure code accuracy, apply this suggestion manually]

Suggestion importance[1-10]: 10

__

Why: This suggestion points out a critical security vulnerability (symlink-based Zip Slip) that is not handled by the current implementation, allowing an attacker to write files outside the intended extraction directory.

High
Possible issue
Make tests fail correctly

Refactor the test functions in test_zip_slip_protection to use assert statements
instead of return False. This ensures that test failures are correctly reported
by pytest.

tests/test_security_fixes.py [63-143]

 def test_zip_slip_protection():
     """Test that Zip Slip protection works."""
     print("\n" + "="*70)
     print("TEST: Zip Slip Protection")
     print("="*70)
 
     with tempfile.TemporaryDirectory() as tmpdir:
         tmpdir = pathlib.Path(tmpdir)
 
         # Test 1: Safe ZIP (should succeed)
         safe_zip = tmpdir / "safe.zip"
         with zipfile.ZipFile(safe_zip, 'w') as z:
             z.writestr("data/file1.txt", "content1")
             z.writestr("data/subdir/file2.txt", "content2")
 
         out_dir = tmpdir / "output_safe"
-        try:
-            safe_extract(safe_zip, out_dir, verbose=False)
-            assert (out_dir / "data" / "file1.txt").exists()
-            assert (out_dir / "data" / "subdir" / "file2.txt").exists()
-            print("✓ Safe ZIP extracted successfully")
-        except Exception as e:
-            print(f"✗ Safe ZIP failed: {e}")
-            return False
-        ...
+        safe_extract(safe_zip, out_dir, verbose=False)
+        assert (out_dir / "data" / "file1.txt").exists()
+        assert (out_dir / "data" / "subdir" / "file2.txt").exists()
+
+        # Test 2: ZIP with parent traversal (should fail)
+        traversal_zip = tmpdir / "traversal.zip"
+        with zipfile.ZipFile(traversal_zip, 'w') as z:
+            z.writestr("data/file1.txt", "safe")
+            z.writestr("../evil.txt", "malicious")
+
+        out_dir = tmpdir / "output_traversal"
         try:
             safe_extract(traversal_zip, out_dir, verbose=False)
-            print("✗ Parent traversal ZIP was not blocked!")
-            return False
+            assert False, "Parent traversal ZIP was not blocked"
         except RuntimeError as e:
-            if "Parent traversal" in str(e):
-                print("✓ Parent traversal blocked")
-            else:
-                print(f"✗ Wrong error: {e}")
-                return False
-        ...
-    print("\n✓ All Zip Slip protection tests passed!")
-    return True
+            assert "Parent traversal" in str(e)
 
+        # Test 3: ZIP with absolute path (should fail)
+        absolute_zip = tmpdir / "absolute.zip"
+        with zipfile.ZipFile(absolute_zip, 'w') as z:
+            z.writestr("data/file1.txt", "safe")
+            z.writestr("/tmp/evil.txt", "malicious")
+
+        out_dir = tmpdir / "output_absolute"
+        try:
+            safe_extract(absolute_zip, out_dir, verbose=False)
+            assert False, "Absolute path ZIP was not blocked"
+        except RuntimeError as e:
+            assert "Absolute path" in str(e)
+
+        # Test 4: ZIP with complex traversal (should fail)
+        complex_zip = tmpdir / "complex.zip"
+        with zipfile.ZipFile(complex_zip, 'w') as z:
+            z.writestr("data/file1.txt", "safe")
+            z.writestr("data/../../evil.txt", "malicious")
+
+        out_dir = tmpdir / "output_complex"
+        try:
+            safe_extract(complex_zip, out_dir, verbose=False)
+            assert False, "Complex traversal ZIP was not blocked"
+        except RuntimeError as e:
+            assert ("Parent traversal" in str(e)) or ("Escapes" in str(e))
+

[To ensure code accuracy, apply this suggestion manually]

Suggestion importance[1-10]: 8

__

Why: This suggestion correctly identifies that the test functions use return False instead of assertions, which would cause pytest to incorrectly report failing tests as passing, making the test suite unreliable.

Medium
Unify strain magnitude definition
Suggestion Impact:Updated S_mag in the GPU postprocessing fallback path from sqrt(2.0*(...)) to sqrt(...), matching the Frobenius norm definition; also updated comments to reflect this.

code diff:

-                    // Fallback: use strain magnitude
+                    // Fallback: use strain magnitude (Frobenius norm)
                     const double dudx_v = dudx[idx_cell];
                     const double dvdy_v = dvdy[idx_cell];
                     const double Sxx = dudx_v;
                     const double Syy = dvdy_v;
-                    const double S_mag = sqrt(2.0 * (Sxx*Sxx + Syy*Syy + 2.0*Sxy*Sxy));
+                    const double S_mag = sqrt(Sxx*Sxx + Syy*Syy + 2.0*Sxy*Sxy);
                     if (S_mag > 1e-10) {
                         const double b_mag = sqrt(b_xx*b_xx + 2.0*b_xy*b_xy + b_yy*b_yy);
                         nu_t_val = k_val * b_mag / S_mag;
@@ -508,12 +508,12 @@
                 if (fabs(Sxy) > 1e-10) {
                     nu_t_val = fabs(-b_xy * k_val / Sxy);
                 } else {
-                    // Fallback: use strain magnitude
+                    // Fallback: use strain magnitude (Frobenius norm)
                     const double dudx_v = dudx[idx_cell];
                     const double dvdy_v = dvdy[idx_cell];
                     const double Sxx = dudx_v;
                     const double Syy = dvdy_v;
-                    const double S_mag = sqrt(2.0 * (Sxx*Sxx + Syy*Syy + 2.0*Sxy*Sxy));
+                    const double S_mag = sqrt(Sxx*Sxx + Syy*Syy + 2.0*Sxy*Sxy);
                     if (S_mag > 1e-10) {
                         const double b_mag = sqrt(b_xx*b_xx + 2.0*b_xy*b_xy + b_yy*b_yy);
                         nu_t_val = k_val * b_mag / S_mag;

Unify the strain magnitude (S_mag) calculation in postprocess_nn_outputs_gpu to
use the Frobenius norm, matching the definition in other GPU kernels for
consistency.

src/gpu_kernels.cpp [516-520]

-const double S_mag = sqrt(2.0 * (Sxx*Sxx + Syy*Syy + 2.0*Sxy*Sxy));
+const double S_mag = sqrt(Sxx*Sxx + Syy*Syy + 2.0*Sxy*Sxy);
 if (S_mag > 1e-10) {
     const double b_mag = sqrt(b_xx*b_xx + 2.0*b_xy*b_xy + b_yy*b_yy);
     nu_t_val = k_val * b_mag / S_mag;
 }

[To ensure code accuracy, apply this suggestion manually]

Suggestion importance[1-10]: 7

__

Why: This is a valid correctness fix, as it points out an inconsistent formula for S_mag between feature generation and post-processing on the GPU, which could lead to incorrect physical results.

Medium
Make reference loading deterministic
Suggestion Impact:The commit removes the O(N^2) nearest-neighbor scan and replaces it with direct i/j index computation using (x-x0)/dx and (y-y0)/dy with llround, adds bounds checking, and adds a tolerance-based sanity check before setting the field value.

code diff:

+    // Direct indexing for uniform mesh (much faster than nearest-neighbor)
+    const double x0 = mesh.x(mesh.i_begin());
+    const double y0 = mesh.y(mesh.j_begin());
+    const double inv_dx = 1.0 / mesh.dx;
+    const double inv_dy = 1.0 / mesh.dy;
+    
     while (std::getline(file, line)) {
         // Skip comments and blank lines
         if (line.empty() || line[0] == '#') continue;
@@ -52,28 +58,28 @@
         double x, y, value;
         if (!(iss >> x >> y >> value)) continue;
         
-        // Find closest mesh point (simple nearest-neighbor)
-        int best_i = -1, best_j = -1;
-        double min_dist = 1e30;
-        for (int j = mesh.j_begin(); j < mesh.j_end(); ++j) {
-            for (int i = mesh.i_begin(); i < mesh.i_end(); ++i) {
-                const double dx = mesh.x(i) - x;
-                const double dy = mesh.y(j) - y;
-                const double dist = dx*dx + dy*dy;
-                if (dist < min_dist) {
-                    min_dist = dist;
-                    best_i = i;
-                    best_j = j;
-                }
-            }
-        }
-        if (best_i >= 0 && best_j >= 0) {
-            // Count only if this cell wasn't already set
-            if (!std::isfinite(field(best_i, best_j))) {
-                ++num_set;
-            }
-            field(best_i, best_j) = value;
-        }
+        // Direct index calculation for uniform mesh
+        const int i = mesh.i_begin() + static_cast<int>(std::llround((x - x0) * inv_dx));
+        const int j = mesh.j_begin() + static_cast<int>(std::llround((y - y0) * inv_dy));
+        
+        // Check bounds
+        if (i < mesh.i_begin() || i >= mesh.i_end() || j < mesh.j_begin() || j >= mesh.j_end()) {
+            continue; // out-of-domain line
+        }
+        
+        // Optional sanity: ensure the file point matches the chosen cell center
+        // Use a tolerance that accounts for typical printf/iostream rounding
+        const double dx_err = std::abs(mesh.x(i) - x);
+        const double dy_err = std::abs(mesh.y(j) - y);
+        if (dx_err > 0.01 * mesh.dx || dy_err > 0.01 * mesh.dy) {
+            continue;
+        }
+        
+        // Count only if this cell wasn't already set
+        if (!std::isfinite(field(i, j))) {
+            ++num_set;
+        }
+        field(i, j) = value;
     }

Refactor the read_scalar_field_from_dat function to improve performance and
robustness. Replace the slow nearest-neighbor search with a direct index
calculation for the uniform mesh.

tests/test_cpu_gpu_consistency.cpp [36-87]

 ScalarField read_scalar_field_from_dat(const std::string& filename, const Mesh& mesh) {
     std::ifstream file(filename);
     if (!file) {
         throw std::runtime_error("Cannot open reference file: " + filename);
     }
-    
-    // Initialize with NaN to detect unpopulated cells
+
     ScalarField field(mesh, std::numeric_limits<double>::quiet_NaN());
     std::string line;
     int num_set = 0;
-    
+
+    const double x0 = mesh.x(mesh.i_begin());
+    const double y0 = mesh.y(mesh.j_begin());
+    const double inv_dx = 1.0 / mesh.dx;
+    const double inv_dy = 1.0 / mesh.dy;
+
     while (std::getline(file, line)) {
-        // Skip comments and blank lines
         if (line.empty() || line[0] == '#') continue;
-        
+
         std::istringstream iss(line);
         double x, y, value;
         if (!(iss >> x >> y >> value)) continue;
-        
-        // Find closest mesh point (simple nearest-neighbor)
-        int best_i = -1, best_j = -1;
-        double min_dist = 1e30;
-        for (int j = mesh.j_begin(); j < mesh.j_end(); ++j) {
-            for (int i = mesh.i_begin(); i < mesh.i_end(); ++i) {
-                const double dx = mesh.x(i) - x;
-                const double dy = mesh.y(j) - y;
-                const double dist = dx*dx + dy*dy;
-                if (dist < min_dist) {
-                    min_dist = dist;
-                    best_i = i;
-                    best_j = j;
-                }
-            }
+
+        const int i = mesh.i_begin() + static_cast<int>(std::llround((x - x0) * inv_dx));
+        const int j = mesh.j_begin() + static_cast<int>(std::llround((y - y0) * inv_dy));
+
+        if (i < mesh.i_begin() || i >= mesh.i_end() || j < mesh.j_begin() || j >= mesh.j_end()) {
+            continue;  // out-of-domain line
         }
-        if (best_i >= 0 && best_j >= 0) {
-            // Count only if this cell wasn't already set
-            if (!std::isfinite(field(best_i, best_j))) {
-                ++num_set;
-            }
-            field(best_i, best_j) = value;
+
+        // Optional sanity: ensure (x,y) matches the target cell center closely
+        const double dx_err = std::abs(mesh.x(i) - x);
+        const double dy_err = std::abs(mesh.y(j) - y);
+        if (dx_err > 1e-8 || dy_err > 1e-8) {
+            continue;
         }
+
+        if (!std::isfinite(field(i, j))) {
+            ++num_set;
+        }
+        field(i, j) = value;
     }
-    
-    // Verify all interior cells were populated
+
     const int expected = (mesh.i_end() - mesh.i_begin()) * (mesh.j_end() - mesh.j_begin());
     if (num_set != expected) {
         throw std::runtime_error("Reference file did not populate all interior cells: " +
                                  std::to_string(num_set) + "/" + std::to_string(expected));
     }
-    
+
     return field;
 }

[To ensure code accuracy, apply this suggestion manually]

Suggestion importance[1-10]: 6

__

Why: The suggestion correctly identifies a performance and robustness issue in a test helper function and proposes a much better implementation, improving the quality and reliability of the test suite.

Low
Remove hard-coded basis sizing
Suggestion Impact:Updated basis_flat_ resize to use TensorBasis::NUM_BASIS * 3 instead of the hard-coded 12, making the buffer sizing depend on the actual number of basis tensors.

code diff:

     features_flat_.resize(n_cells * feature_dim);
-    basis_flat_.resize(n_cells * 12);  // 4 basis tensors × 3 components
+    basis_flat_.resize(n_cells * TensorBasis::NUM_BASIS * 3);  // NUM_BASIS tensors × 3 components

Replace the hard-coded size 12 for basis_flat_ with a calculation using
TensorBasis::NUM_BASIS to avoid potential memory corruption if the number of
basis tensors changes.

src/turbulence_nn_tbnn.cpp [72-76]

 // Allocate CPU buffers
 features_flat_.resize(n_cells * feature_dim);
-basis_flat_.resize(n_cells * 12);  // 4 basis tensors × 3 components
+basis_flat_.resize(n_cells * TensorBasis::NUM_BASIS * 3);  // NUM_BASIS tensors × 3 components
 outputs_flat_.resize(n_cells * output_dim);
 workspace_.resize(workspace_size);

[To ensure code accuracy, apply this suggestion manually]

Suggestion importance[1-10]: 6

__

Why: The suggestion correctly identifies a hard-coded value (12) that should be derived from the TensorBasis::NUM_BASIS constant to improve maintainability and prevent future bugs.

Low
Possible issue
Map correct device buffer size
Suggestion Impact:The commit changes total_field_size from (Nx + 2*Ng) * (Ny + 2*Ng) to stride * (Ny + 2*Ng), matching the suggestion to correctly size the mapped nu_t_field buffer on the GPU.

code diff:

-    const int total_field_size = (Nx + 2*Ng) * (Ny + 2*Ng);
+    const int total_field_size = stride * (Ny + 2*Ng);

In postprocess_mlp_outputs_gpu, correct the calculation of total_field_size to
use the passed stride parameter, ensuring the OpenMP target mapping for
nu_t_field is large enough to prevent out-of-bounds writes on the GPU.

src/gpu_kernels.cpp [225-250]

-const int total_field_size = (Nx + 2*Ng) * (Ny + 2*Ng);
+const int total_field_size = stride * (Ny + 2*Ng);
 
 // CRITICAL: map(present:...) indicates these arrays are already mapped
 #pragma omp target teams distribute parallel for collapse(2) \
     map(present: nn_outputs[0:(Nx*Ny)], nu_t_field[0:total_field_size])
 for (int jj = 0; jj < Ny; ++jj) {
     for (int ii = 0; ii < Nx; ++ii) {
         const int i = ii + Ng;
         const int j = jj + Ng;
         const int idx_in = jj * Nx + ii;
         const int idx_out = j * stride + i;
         
-        // Get NN prediction
         double nu_t_val = nn_outputs[idx_in];
         
-        // Apply realizability and clipping
-        if (nu_t_val != nu_t_val || nu_t_val < 0.0) {  // NaN or negative
+        if (nu_t_val != nu_t_val || nu_t_val < 0.0) {
             nu_t_val = 0.0;
         }
         if (nu_t_val > nu_t_max) {
             nu_t_val = nu_t_max;
         }
         
         nu_t_field[idx_out] = nu_t_val;
     }
 }

[To ensure code accuracy, apply this suggestion manually]

Suggestion importance[1-10]: 8

__

Why: The suggestion correctly identifies a critical bug where the allocated size for nu_t_field on the GPU is calculated with an incorrect stride, which could lead to out-of-bounds memory writes and undefined behavior.

Medium
Use correct stride parameters
Suggestion Impact:The commit replaces the hard-coded u_stride and v_stride arguments with vel.u_stride() and vel.v_stride(), improving robustness as suggested.

code diff:

-        mesh.Nx + 2*mesh.Nghost + 1,  // u_stride
-        mesh.Nx + 2*mesh.Nghost,       // v_stride
-        mesh.total_Nx(),                // cell_stride
+        vel.u_stride(),        // u_stride
+        vel.v_stride(),        // v_stride
+        mesh.total_Nx(),       // cell_stride

Replace hard-coded stride calculations with accessor methods like vel.u_stride()
when calling gpu_kernels::compute_gradients_from_mac_gpu to improve code
robustness.

tests/test_features.cpp [397-406]

 gpu_kernels::compute_gradients_from_mac_gpu(
     u_ptr, v_ptr,
     dudx_ptr, dudy_ptr, dvdx_ptr, dvdy_ptr,
     mesh.Nx, mesh.Ny, mesh.Nghost,
     mesh.dx, mesh.dy,
-    mesh.Nx + 2*mesh.Nghost + 1,  // u_stride
-    mesh.Nx + 2*mesh.Nghost,       // v_stride
-    mesh.total_Nx(),                // cell_stride
+    vel.u_stride(),        // u_stride
+    vel.v_stride(),        // v_stride
+    mesh.total_Nx(),       // cell_stride
     u_total, v_total, total_cells
 );

[To ensure code accuracy, apply this suggestion manually]

Suggestion importance[1-10]: 7

__

Why: The suggestion correctly points out that using accessor methods like vel.u_stride() is more robust than hardcoding stride calculations, improving maintainability and reducing the risk of future bugs if data structures change.

Medium
Detect incomplete layer file pairs

Modify the file validation loop to detect and report an error if a model layer
is incomplete (i.e., a weight file exists without its corresponding bias file,
or vice-versa).

scripts/validate_model_files.py [110-115]

 while True:
     W_file = model_dir / f'layer{layer_idx}_W.txt'
     b_file = model_dir / f'layer{layer_idx}_b.txt'
-    
-    if not W_file.exists() or not b_file.exists():
+
+    has_W = W_file.exists()
+    has_b = b_file.exists()
+
+    if not has_W and not has_b:
         break
 
+    if has_W != has_b:
+        issues.append(
+            f"Incomplete layer {layer_idx} files: "
+            f"W exists={has_W}, b exists={has_b}"
+        )
+        break
+

[To ensure code accuracy, apply this suggestion manually]

Suggestion importance[1-10]: 7

__

Why: The suggestion correctly identifies a logic flaw where incomplete model layer files are not detected, and the proposed change improves the validation script's correctness.

Medium
Use real strides and sizes

Replace hard-coded stride and size calculations with accessor methods from the
mesh and velocity objects to improve code robustness and consistency.

src/turbulence_nn_mlp.cpp [192-197]

-const int total_cells = (Nx + 2*Ng) * (Ny + 2*Ng);
+const int total_cells = mesh.total_cells();
 const int u_total = velocity.u_total_size();
 const int v_total = velocity.v_total_size();
-const int cell_stride = Nx + 2*Ng;
-const int u_stride = Nx + 2*Ng + 1;
-const int v_stride = Nx + 2*Ng;
+const int cell_stride = mesh.total_Nx();
+const int u_stride = velocity.u_stride();
+const int v_stride = velocity.v_stride();

[To ensure code accuracy, apply this suggestion manually]

Suggestion importance[1-10]: 6

__

Why: The suggestion correctly identifies that hard-coding stride and size calculations is brittle. Using accessor methods from mesh and velocity makes the code more robust and maintainable, as demonstrated by the consistent approach in turbulence_nn_tbnn.cpp.

Low
Handle single-value stats safely

Ensure np.loadtxt results are always arrays to prevent a TypeError when a file
contains a single value, and add a shape mismatch check for robustness.

scripts/fix_normalization_stats.py [108-136]

-means = np.loadtxt(means_file)
-stds = np.loadtxt(stds_file)
+means = np.atleast_1d(np.loadtxt(means_file)).astype(float).ravel()
+stds = np.atleast_1d(np.loadtxt(stds_file)).astype(float).ravel()
+if means.shape != stds.shape:
+    print(f"ERROR: Means/stds shape mismatch: means={means.shape}, stds={stds.shape}")
+    return False
 ...
-n_features = len(means)
+n_features = means.size

[To ensure code accuracy, apply this suggestion manually]

Suggestion importance[1-10]: 6

__

Why: The suggestion correctly identifies a TypeError edge case when np.loadtxt reads a single value and provides a robust fix, improving code correctness.

Low
Require minimum model files exist

Strengthen the resolve_model_dir test helper by checking for both layer0_W.txt
and layer0_b.txt to ensure a minimal set of model files exists, preventing flaky
test failures.

tests/test_backend_execution.cpp [31-51]

 static std::string resolve_model_dir(const std::string& p) {
     // Strip trailing slashes
     std::string path = p;
     while (!path.empty() && path.back() == '/') {
         path.pop_back();
     }
-    
+
+    auto has_min_files = [&](const std::string& dir) {
+        return file_exists(dir + "/layer0_W.txt") && file_exists(dir + "/layer0_b.txt");
+    };
+
     // Try relative to current directory (when running from repo root)
-    if (file_exists(path + "/layer0_W.txt")) {
+    if (has_min_files(path)) {
         return path;
     }
-    
+
     // Try relative to build directory (when running from build/)
-    if (file_exists("../" + path + "/layer0_W.txt")) {
+    if (has_min_files("../" + path)) {
         return "../" + path;
     }
-    
+
     throw std::runtime_error(
         "NN model files not found. Tried: " + path + " and ../" + path
     );
 }

[To ensure code accuracy, apply this suggestion manually]

Suggestion importance[1-10]: 5

__

Why: The suggestion correctly points out a weakness in the model directory validation logic within the test suite. Checking for both weights and biases files makes the test helper more robust and provides clearer failure modes, improving the reliability of the tests.

Low
General
Fail fast when CLI missing

Add a check to ensure the kaggle command-line tool is installed before
attempting to use it, providing a clearer error message to the user if it is
missing.

scripts/download_mcconkey_data.sh [47]

+if ! command -v kaggle >/dev/null 2>&1; then
+    echo "ERROR: Kaggle CLI not found on PATH." >&2
+    echo "Install it with: pip install kaggle" >&2
+    exit 1
+fi
+
 kaggle datasets download -d "$DATASET_NAME" -p "$ZIP_DIR"

[To ensure code accuracy, apply this suggestion manually]

Suggestion importance[1-10]: 5

__

Why: The suggestion correctly points out that a check for the kaggle CLI was removed and suggests re-adding it for better user experience, which is a valid improvement.

Low
  • Update

Previous suggestions

✅ Suggestions up to commit ddc4146
CategorySuggestion                                                                                                                                    Impact
Possible issue
Fix ghosted indexing and mapping
Suggestion Impact:The function signature was changed from n_cells to Nx/Ny/Ng plus cell_stride/total_cells, and the kernel was rewritten from a 1D loop to a collapsed 2D (jj, ii) loop computing idx_out (interior) and idx_cell (ghosted). Additionally, when compute_tau is true, tau_xx/tau_xy/tau_yy are included in the OpenMP target map(present: ...) clause and written using idx_cell.

code diff:

@@ -385,7 +385,9 @@
     const double* dvdx, const double* dvdy,
     double* nu_t,
     double* tau_xx, double* tau_xy, double* tau_yy,
-    int n_cells, int output_dim,
+    int Nx, int Ny, int Ng,
+    int cell_stride, int total_cells,
+    int output_dim,
     double nu_ref)
 {
 #ifdef USE_GPU_OFFLOAD
@@ -393,78 +395,151 @@
     const bool compute_tau = (tau_xx != nullptr);
     
     // CRITICAL: map(present:...) indicates these arrays are already mapped by turbulence model
-    // Note: tau arrays may be nullptr, but map clause is safe (runtime ignores null pointers)
-    #pragma omp target teams distribute parallel for \
-        map(present: nn_outputs[0:(n_cells*output_dim)], basis[0:(n_cells*12)], \
-                     k[0:n_cells], dudx[0:n_cells], dudy[0:n_cells], \
-                     dvdx[0:n_cells], dvdy[0:n_cells], nu_t[0:n_cells])
-    for (int idx = 0; idx < n_cells; ++idx) {
-        // Extract G coefficients from NN output
-        double G[4] = {0.0, 0.0, 0.0, 0.0};
-        int out_base = idx * output_dim;
-        for (int n = 0; n < NUM_BASIS && n < output_dim; ++n) {
-            G[n] = nn_outputs[out_base + n];
-        }
-        
-        // Get basis tensors
-        int basis_base = idx * 12;
-        
-        // Construct anisotropy tensor: b_ij = sum_n G_n * T^n_ij
-        double b_xx = 0.0, b_xy = 0.0, b_yy = 0.0;
-        for (int n = 0; n < NUM_BASIS; ++n) {
-            b_xx += G[n] * basis[basis_base + n*3 + 0];
-            b_xy += G[n] * basis[basis_base + n*3 + 1];
-            b_yy += G[n] * basis[basis_base + n*3 + 2];
-        }
-        
-        // Compute Reynolds stresses if requested
-        double k_val = k[idx];
-        if (compute_tau) {
-            double k_safe = (k_val > 0.0) ? k_val : 0.0;
-            tau_xx[idx] = 2.0 * k_safe * (b_xx + 1.0/3.0);
-            tau_xy[idx] = 2.0 * k_safe * b_xy;
-            tau_yy[idx] = 2.0 * k_safe * (b_yy + 1.0/3.0);
-        }
-        
-        // Compute equivalent eddy viscosity
-        // nu_t = -b_xy * k / S_xy (when S_xy is significant)
-        double dudy_v = dudy[idx];
-        double dvdx_v = dvdx[idx];
-        double Sxy = 0.5 * (dudy_v + dvdx_v);
-        
-        double nu_t_val = 0.0;
-        if (fabs(Sxy) > 1e-10) {
-            nu_t_val = fabs(-b_xy * k_val / Sxy);
-        } else {
-            // Fallback: use strain magnitude
-            double dudx_v = dudx[idx];
-            double dvdy_v = dvdy[idx];
-            double Sxx = dudx_v;
-            double Syy = dvdy_v;
-            double S_mag = sqrt(2.0 * (Sxx*Sxx + Syy*Syy + 2.0*Sxy*Sxy));
-            if (S_mag > 1e-10) {
-                double b_mag = sqrt(b_xx*b_xx + 2.0*b_xy*b_xy + b_yy*b_yy);
-                nu_t_val = k_val * b_mag / S_mag;
+    // nn_outputs/basis are interior-only (Nx*Ny), k/gradients/nu_t/tau are ghosted (total_cells)
+    if (!compute_tau) {
+        #pragma omp target teams distribute parallel for collapse(2) \
+            map(present: nn_outputs[0:(Nx*Ny*output_dim)], basis[0:(Nx*Ny*12)], \
+                         k[0:total_cells], dudx[0:total_cells], dudy[0:total_cells], \
+                         dvdx[0:total_cells], dvdy[0:total_cells], nu_t[0:total_cells])
+        for (int jj = 0; jj < Ny; ++jj) {
+            for (int ii = 0; ii < Nx; ++ii) {
+                const int i = ii + Ng;
+                const int j = jj + Ng;
+                const int idx_cell = j * cell_stride + i;   // ghosted field index
+                const int idx_out  = jj * Nx + ii;          // interior index for nn/basis
+                
+                // Extract G coefficients from NN output
+                double G[NUM_BASIS] = {0.0, 0.0, 0.0, 0.0};
+                const int out_base = idx_out * output_dim;
+                for (int n = 0; n < NUM_BASIS && n < output_dim; ++n) {
+                    G[n] = nn_outputs[out_base + n];
+                }
+                
+                // Get basis tensors
+                const int basis_base = idx_out * 12;
+                
+                // Construct anisotropy tensor: b_ij = sum_n G_n * T^n_ij
+                double b_xx = 0.0, b_xy = 0.0, b_yy = 0.0;
+                for (int n = 0; n < NUM_BASIS; ++n) {
+                    b_xx += G[n] * basis[basis_base + n*3 + 0];
+                    b_xy += G[n] * basis[basis_base + n*3 + 1];
+                    b_yy += G[n] * basis[basis_base + n*3 + 2];
+                }
+                
+                // Compute equivalent eddy viscosity
+                const double k_val = k[idx_cell];
+                const double dudy_v = dudy[idx_cell];
+                const double dvdx_v = dvdx[idx_cell];
+                const double Sxy = 0.5 * (dudy_v + dvdx_v);
+                
+                double nu_t_val = 0.0;
+                if (fabs(Sxy) > 1e-10) {
+                    nu_t_val = fabs(-b_xy * k_val / Sxy);
+                } else {
+                    // Fallback: use strain magnitude
+                    const double dudx_v = dudx[idx_cell];
+                    const double dvdy_v = dvdy[idx_cell];
+                    const double Sxx = dudx_v;
+                    const double Syy = dvdy_v;
+                    const double S_mag = sqrt(2.0 * (Sxx*Sxx + Syy*Syy + 2.0*Sxy*Sxy));
+                    if (S_mag > 1e-10) {
+                        const double b_mag = sqrt(b_xx*b_xx + 2.0*b_xy*b_xy + b_yy*b_yy);
+                        nu_t_val = k_val * b_mag / S_mag;
+                    }
+                }
+                
+                // Clip to reasonable bounds
+                const double max_nu_t = 10.0 * nu_ref;
+                nu_t_val = (nu_t_val > 0.0) ? nu_t_val : 0.0;
+                nu_t_val = (nu_t_val < max_nu_t) ? nu_t_val : max_nu_t;
+                
+                // Check for NaN/Inf
+                if (nu_t_val != nu_t_val || nu_t_val > 1e30) {
+                    nu_t_val = 0.0;
+                }
+                
+                nu_t[idx_cell] = nu_t_val;
             }
         }
-        
-        // Clip to reasonable bounds
-        double max_nu_t = 10.0 * nu_ref;
-        nu_t_val = (nu_t_val > 0.0) ? nu_t_val : 0.0;
-        nu_t_val = (nu_t_val < max_nu_t) ? nu_t_val : max_nu_t;
-        
-        // Check for NaN/Inf
-        if (nu_t_val != nu_t_val || nu_t_val > 1e30) {  // NaN check: x != x is true only for NaN
-            nu_t_val = 0.0;
-        }
-        
-        nu_t[idx] = nu_t_val;
+    } else {
+        #pragma omp target teams distribute parallel for collapse(2) \
+            map(present: nn_outputs[0:(Nx*Ny*output_dim)], basis[0:(Nx*Ny*12)], \
+                         k[0:total_cells], dudx[0:total_cells], dudy[0:total_cells], \
+                         dvdx[0:total_cells], dvdy[0:total_cells], nu_t[0:total_cells], \
+                         tau_xx[0:total_cells], tau_xy[0:total_cells], tau_yy[0:total_cells])
+        for (int jj = 0; jj < Ny; ++jj) {
+            for (int ii = 0; ii < Nx; ++ii) {
+                const int i = ii + Ng;
+                const int j = jj + Ng;
+                const int idx_cell = j * cell_stride + i;
+                const int idx_out  = jj * Nx + ii;
+                
+                // Extract G coefficients from NN output
+                double G[NUM_BASIS] = {0.0, 0.0, 0.0, 0.0};
+                const int out_base = idx_out * output_dim;
+                for (int n = 0; n < NUM_BASIS && n < output_dim; ++n) {
+                    G[n] = nn_outputs[out_base + n];
+                }
+                
+                // Get basis tensors
+                const int basis_base = idx_out * 12;
+                
+                // Construct anisotropy tensor: b_ij = sum_n G_n * T^n_ij
+                double b_xx = 0.0, b_xy = 0.0, b_yy = 0.0;
+                for (int n = 0; n < NUM_BASIS; ++n) {
+                    b_xx += G[n] * basis[basis_base + n*3 + 0];
+                    b_xy += G[n] * basis[basis_base + n*3 + 1];
+                    b_yy += G[n] * basis[basis_base + n*3 + 2];
+                }
+                
+                // Compute Reynolds stresses
+                const double k_val = k[idx_cell];
+                const double k_safe = (k_val > 0.0) ? k_val : 0.0;
+                tau_xx[idx_cell] = 2.0 * k_safe * (b_xx + 1.0/3.0);
+                tau_xy[idx_cell] = 2.0 * k_safe * b_xy;
+                tau_yy[idx_cell] = 2.0 * k_safe * (b_yy + 1.0/3.0);
+                
+                // Compute equivalent eddy viscosity
+                const double dudy_v = dudy[idx_cell];
+                const double dvdx_v = dvdx[idx_cell];
+                const double Sxy = 0.5 * (dudy_v + dvdx_v);
+                
+                double nu_t_val = 0.0;
+                if (fabs(Sxy) > 1e-10) {
+                    nu_t_val = fabs(-b_xy * k_val / Sxy);
+                } else {
+                    // Fallback: use strain magnitude
+                    const double dudx_v = dudx[idx_cell];
+                    const double dvdy_v = dvdy[idx_cell];
+                    const double Sxx = dudx_v;
+                    const double Syy = dvdy_v;
+                    const double S_mag = sqrt(2.0 * (Sxx*Sxx + Syy*Syy + 2.0*Sxy*Sxy));
+                    if (S_mag > 1e-10) {
+                        const double b_mag = sqrt(b_xx*b_xx + 2.0*b_xy*b_xy + b_yy*b_yy);
+                        nu_t_val = k_val * b_mag / S_mag;
+                    }
+                }
+                
+                // Clip to reasonable bounds
+                const double max_nu_t = 10.0 * nu_ref;
+                nu_t_val = (nu_t_val > 0.0) ? nu_t_val : 0.0;
+                nu_t_val = (nu_t_val < max_nu_t) ? nu_t_val : max_nu_t;
+                
+                // Check for NaN/Inf
+                if (nu_t_val != nu_t_val || nu_t_val > 1e30) {
+                    nu_t_val = 0.0;
+                }
+                
+                nu_t[idx_cell] = nu_t_val;
+            }
+        }
     }
 #else
     (void)nn_outputs; (void)basis; (void)k;
     (void)dudx; (void)dudy; (void)dvdx; (void)dvdy;
     (void)nu_t; (void)tau_xx; (void)tau_xy; (void)tau_yy;
-    (void)n_cells; (void)output_dim; (void)nu_ref;
+    (void)Nx; (void)Ny; (void)Ng; (void)cell_stride; (void)total_cells;
+    (void)output_dim; (void)nu_ref;
 #endif

Refactor postprocess_nn_outputs_gpu to correctly handle ghosted fields by using
a 2D loop with strides, similar to other GPU kernels. Additionally, ensure tau
buffers are correctly mapped to the device when compute_tau is true.

src/gpu_kernels.cpp [380-405]

 void postprocess_nn_outputs_gpu(
     const double* nn_outputs,
     const double* basis,
     const double* k,
     const double* dudx,
     const double* dudy,
     const double* dvdx,
     const double* dvdy,
     double* nu_t,
     double* tau_xx,
     double* tau_xy,
     double* tau_yy,
-    int n_cells,
+    int Nx, int Ny, int Ng,
+    int cell_stride, int total_cells,
     int output_dim,
     double nu)
 {
 #ifdef USE_GPU_OFFLOAD
-    const int NUM_BASIS = 4;
     const bool compute_tau = (tau_xx != nullptr);
-    
-    // CRITICAL: map(present:...) indicates these arrays are already mapped by turbulence model
-    // Note: tau arrays may be nullptr, but map clause is safe (runtime ignores null pointers)
-    #pragma omp target teams distribute parallel for \
-        map(present: nn_outputs[0:(n_cells*output_dim)], basis[0:(n_cells*12)], \
-                     k[0:n_cells], dudx[0:n_cells], dudy[0:n_cells], \
-                     dvdx[0:n_cells], dvdy[0:n_cells], nu_t[0:n_cells])
-    for (int idx = 0; idx < n_cells; ++idx) {
-        // Extract G coefficients from NN output
-        double G[4] = {0.0, 0.0, 0.0, 0.0};
 
+    if (!compute_tau) {
+        #pragma omp target teams distribute parallel for collapse(2) \
+            map(present: nn_outputs[0:(Nx*Ny*output_dim)], basis[0:(Nx*Ny*12)], \
+                         k[0:total_cells], dudx[0:total_cells], dudy[0:total_cells], \
+                         dvdx[0:total_cells], dvdy[0:total_cells], nu_t[0:total_cells])
+        for (int jj = 0; jj < Ny; ++jj) {
+            for (int ii = 0; ii < Nx; ++ii) {
+                const int i = ii + Ng;
+                const int j = jj + Ng;
+                const int idx_cell = j * cell_stride + i;   // ghosted field index
+                const int idx_out  = jj * Nx + ii;          // interior index for nn/basis
+
+                // ... use idx_out for nn_outputs/basis, idx_cell for k/gradients/nu_t ...
+            }
+        }
+    } else {
+        #pragma omp target teams distribute parallel for collapse(2) \
+            map(present: nn_outputs[0:(Nx*Ny*output_dim)], basis[0:(Nx*Ny*12)], \
+                         k[0:total_cells], dudx[0:total_cells], dudy[0:total_cells], \
+                         dvdx[0:total_cells], dvdy[0:total_cells], nu_t[0:total_cells], \
+                         tau_xx[0:total_cells], tau_xy[0:total_cells], tau_yy[0:total_cells])
+        for (int jj = 0; jj < Ny; ++jj) {
+            for (int ii = 0; ii < Nx; ++ii) {
+                const int i = ii + Ng;
+                const int j = jj + Ng;
+                const int idx_cell = j * cell_stride + i;
+                const int idx_out  = jj * Nx + ii;
+
+                // ... same, plus write tau_* at idx_cell ...
+            }
+        }
+    }
+#else
+    (void)nn_outputs; (void)basis; (void)k; (void)dudx; (void)dudy; (void)dvdx; (void)dvdy;
+    (void)nu_t; (void)tau_xx; (void)tau_xy; (void)tau_yy;
+    (void)Nx; (void)Ny; (void)Ng; (void)cell_stride; (void)total_cells;
+    (void)output_dim; (void)nu;
+#endif
+}
+
Suggestion importance[1-10]: 10

__

Why: This suggestion identifies two critical bugs: incorrect memory indexing for ghosted fields and missing map clauses for tau buffers, both of which would cause memory corruption or crashes in the GPU pipeline.

High
Write outputs to device buffer
Suggestion Impact:The commit removed the host-side nu_t_ptr and updated the GPU kernel call to use device_view->nu_t, aligning output writes with the device buffer and avoiding invalid host-pointer access on GPU.

code diff:

@@ -241,11 +241,10 @@
         {
             TIMED_SCOPE("nn_mlp_postprocess_gpu");
             double* out_ptr = outputs_flat_.data();
-            double* nu_t_ptr = nu_t.data().data();
             
             gpu_kernels::postprocess_mlp_outputs_gpu(
-                out_ptr,      // NN outputs (n_cells * 1)
-                nu_t_ptr,     // nu_t field with ghosts
+                out_ptr,             // NN outputs (n_cells * 1)
+                device_view->nu_t,   // nu_t field on device (with ghosts)
                 Nx, Ny, Ng,
                 cell_stride,
                 nu_t_max_

In the TurbulenceNNMLP::update GPU path, pass the device-mapped buffer
device_view->nu_t to postprocess_mlp_outputs_gpu instead of the host-side
pointer nu_t.data().data() to prevent memory access errors on the GPU.

src/turbulence_nn_mlp.cpp [244-252]

-double* nu_t_ptr = nu_t.data().data();
-
 gpu_kernels::postprocess_mlp_outputs_gpu(
-    out_ptr,      // NN outputs (n_cells * 1)
-    nu_t_ptr,     // nu_t field with ghosts
+    out_ptr,             // NN outputs (n_cells * 1)
+    device_view->nu_t,   // nu_t field on device (with ghosts)
     Nx, Ny, Ng,
     cell_stride,
     nu_t_max_
 );
Suggestion importance[1-10]: 9

__

Why: This suggestion correctly identifies a critical bug where a host pointer nu_t.data().data() is passed to a GPU kernel expecting a device pointer, which would lead to memory errors or crashes.

High
Validate reference field completeness
Suggestion Impact:The commit adds , initializes the ScalarField with quiet_NaN(), tracks how many unique cells are set (using isfinite checks), and throws if the number set doesn't match the expected interior cell count. It also slightly refactors parsing to skip invalid lines.

code diff:

@@ -18,6 +18,7 @@
 #include <fstream>
 #include <sstream>
 #include <cstring>
+#include <limits>
 
 #ifdef USE_GPU_OFFLOAD
 #include <omp.h>
@@ -38,8 +39,10 @@
         throw std::runtime_error("Cannot open reference file: " + filename);
     }
     
-    ScalarField field(mesh);
+    // Initialize with NaN to detect unpopulated cells
+    ScalarField field(mesh, std::numeric_limits<double>::quiet_NaN());
     std::string line;
+    int num_set = 0;
     
     while (std::getline(file, line)) {
         // Skip comments and blank lines
@@ -47,26 +50,37 @@
         
         std::istringstream iss(line);
         double x, y, value;
-        if (iss >> x >> y >> value) {
-            // Find closest mesh point (simple nearest-neighbor)
-            int best_i = -1, best_j = -1;
-            double min_dist = 1e30;
-            for (int j = mesh.j_begin(); j < mesh.j_end(); ++j) {
-                for (int i = mesh.i_begin(); i < mesh.i_end(); ++i) {
-                    double dx = mesh.x(i) - x;
-                    double dy = mesh.y(j) - y;
-                    double dist = dx*dx + dy*dy;
-                    if (dist < min_dist) {
-                        min_dist = dist;
-                        best_i = i;
-                        best_j = j;
-                    }
+        if (!(iss >> x >> y >> value)) continue;
+        
+        // Find closest mesh point (simple nearest-neighbor)
+        int best_i = -1, best_j = -1;
+        double min_dist = 1e30;
+        for (int j = mesh.j_begin(); j < mesh.j_end(); ++j) {
+            for (int i = mesh.i_begin(); i < mesh.i_end(); ++i) {
+                const double dx = mesh.x(i) - x;
+                const double dy = mesh.y(j) - y;
+                const double dist = dx*dx + dy*dy;
+                if (dist < min_dist) {
+                    min_dist = dist;
+                    best_i = i;
+                    best_j = j;
                 }
             }
-            if (best_i >= 0 && best_j >= 0) {
-                field(best_i, best_j) = value;
+        }
+        if (best_i >= 0 && best_j >= 0) {
+            // Count only if this cell wasn't already set
+            if (!std::isfinite(field(best_i, best_j))) {
+                ++num_set;
             }
-        }
+            field(best_i, best_j) = value;
+        }
+    }
+    
+    // Verify all interior cells were populated
+    const int expected = (mesh.i_end() - mesh.i_begin()) * (mesh.j_end() - mesh.j_begin());
+    if (num_set != expected) {
+        throw std::runtime_error("Reference file did not populate all interior cells: " +
+                                 std::to_string(num_set) + "/" + std::to_string(expected));
     }

Improve the robustness of read_scalar_field_from_dat by initializing the field
with a sentinel value (e.g., NaN) and verifying that all expected cells are
populated from the input file to prevent uninitialized data issues.

tests/test_cpu_gpu_consistency.cpp [35-73]

 ScalarField read_scalar_field_from_dat(const std::string& filename, const Mesh& mesh) {
     std::ifstream file(filename);
     if (!file) {
         throw std::runtime_error("Cannot open reference file: " + filename);
     }
-    
-    ScalarField field(mesh);
+
+    ScalarField field(mesh, std::numeric_limits<double>::quiet_NaN());
     std::string line;
-    
+
+    int num_set = 0;
+
     while (std::getline(file, line)) {
-        // Skip comments and blank lines
         if (line.empty() || line[0] == '#') continue;
-        
+
         std::istringstream iss(line);
         double x, y, value;
         if (iss >> x >> y >> value) {
-            // Find closest mesh point (simple nearest-neighbor)
             int best_i = -1, best_j = -1;
             double min_dist = 1e30;
+
             for (int j = mesh.j_begin(); j < mesh.j_end(); ++j) {
                 for (int i = mesh.i_begin(); i < mesh.i_end(); ++i) {
-                    double dx = mesh.x(i) - x;
-                    double dy = mesh.y(j) - y;
-                    double dist = dx*dx + dy*dy;
+                    const double dx = mesh.x(i) - x;
+                    const double dy = mesh.y(j) - y;
+                    const double dist = dx*dx + dy*dy;
                     if (dist < min_dist) {
                         min_dist = dist;
                         best_i = i;
                         best_j = j;
                     }
                 }
             }
+
             if (best_i >= 0 && best_j >= 0) {
+                if (!std::isfinite(field(best_i, best_j))) {
+                    ++num_set;
+                }
                 field(best_i, best_j) = value;
             }
         }
     }
-    
+
+    const int expected = mesh.Nx * mesh.Ny;
+    if (num_set != expected) {
+        throw std::runtime_error("Reference file did not populate all interior cells: " +
+                                 std::to_string(num_set) + "/" + std::to_string(expected));
+    }
+
     return field;
 }
Suggestion importance[1-10]: 8

__

Why: This suggestion correctly identifies a potential for non-deterministic behavior if the input .dat file doesn't populate all mesh cells, proposing a robust solution by initializing with NaN and verifying completeness.

Medium
Clear stale comparison outputs
Suggestion Impact:The script now deletes the cpu_gpu_comparison directory (rm -rf cpu_gpu_comparison) before recreating it, ensuring a clean reference output directory and preventing stale files from previous runs.

code diff:

 echo "--- Step 1: Generate CPU reference outputs ---"
+rm -rf cpu_gpu_comparison
 mkdir -p cpu_gpu_comparison

Add rm -rf cpu_gpu_comparison to ensure the CPU reference directory is clean
before generating new outputs, preventing interference from stale files from
previous runs.

.github/scripts/compare_cpu_gpu_builds.sh [28-33]

 echo "--- Step 1: Generate CPU reference outputs ---"
+rm -rf cpu_gpu_comparison
 mkdir -p cpu_gpu_comparison
 ./test_cpu_gpu_consistency --dump-prefix cpu_gpu_comparison/cpu_ref || {
     echo "[FAIL] CPU reference generation failed!"
     exit 1
 }

[Suggestion processed]

Suggestion importance[1-10]: 7

__

Why: This is a valid and important suggestion for CI script reliability, as it prevents stale artifacts from a previous run from causing false passes or failures, which improves the test's integrity.

Medium
Avoid unzipping stale archives

Add a command to remove stale zip files before downloading and re-introduce a
check to ensure the kaggle CLI is installed for better robustness.

scripts/download_mcconkey_data.sh [42-53]

 export KAGGLE_CONFIG_DIR="${HOME}/.kaggle"
+
+if ! command -v kaggle >/dev/null 2>&1; then
+    echo "ERROR: Kaggle CLI not found in PATH (required to download ${DATASET_NAME})." >&2
+    exit 1
+fi
 
 ZIP_DIR="${OUTPUT_DIR}/_kaggle_zip"
 mkdir -p "${ZIP_DIR}"
+rm -f "${ZIP_DIR}"/*.zip
 
 kaggle datasets download -d "$DATASET_NAME" -p "$ZIP_DIR"
 
 ZIP_FILE="$(ls -1t "${ZIP_DIR}"/*.zip 2>/dev/null | head -n 1 || true)"
 if [[ -z "${ZIP_FILE}" ]]; then
     echo "ERROR: Kaggle download completed but no .zip was found in ${ZIP_DIR}" >&2
     exit 1
 fi
Suggestion importance[1-10]: 6

__

Why: The suggestion correctly identifies a potential issue where stale .zip files could be used, and rm -f is a good fix. It also improves error handling by checking for the kaggle CLI, making the script more robust.

Low
✅ Suggestions up to commit ab96b34
CategorySuggestion                                                                                                                                    Impact
Possible issue
Write GPU outputs to mapped buffers
Suggestion Impact:The commit replaced host pointers (nu_t.data(), tau_ij->*_data()) with device_view->nu_t and device_view->tau_* pointers for GPU postprocessing, and added runtime checks ensuring the required device_view buffers are present.

code diff:

+    // Validate device_view has all required buffers
+    if (!device_view->u_face || !device_view->v_face ||
+        !device_view->k || !device_view->omega ||
+        !device_view->dudx || !device_view->dudy || !device_view->dvdx || !device_view->dvdy ||
+        !device_view->wall_distance ||
+        !device_view->nu_t) {
+        throw std::runtime_error("NN-TBNN GPU pipeline: device_view missing required buffers");
+    }
+    if (tau_ij && (!device_view->tau_xx || !device_view->tau_xy || !device_view->tau_yy)) {
+        throw std::runtime_error("NN-TBNN GPU pipeline: tau_ij requested but tau buffers not provided in device_view");
+    }
+    
     {
         TIMED_SCOPE("nn_tbnn_full_gpu");
         
@@ -757,10 +769,12 @@
             double* dudy_ptr = device_view->dudy;
             double* dvdx_ptr = device_view->dvdx;
             double* dvdy_ptr = device_view->dvdy;
-            double* nu_t_ptr = nu_t.data().data();
-            double* tau_xx_ptr = tau_ij ? tau_ij->xx_data().data() : nullptr;
-            double* tau_xy_ptr = tau_ij ? tau_ij->xy_data().data() : nullptr;
-            double* tau_yy_ptr = tau_ij ? tau_ij->yy_data().data() : nullptr;
+            
+            // IMPORTANT: write results to device-mapped buffers (host pointers are not valid on device)
+            double* nu_t_ptr = device_view->nu_t;
+            double* tau_xx_ptr = device_view->tau_xx;
+            double* tau_xy_ptr = device_view->tau_xy;
+            double* tau_yy_ptr = device_view->tau_yy;
             

In the GPU post-processing step, use device-mapped pointers from device_view for
nu_t and tau outputs instead of host-side pointers to prevent memory errors.

src/turbulence_nn_tbnn.cpp [753-772]

 double* out_ptr = outputs_flat_.data();
 double* basis_ptr = basis_flat_.data();
 double* k_ptr = device_view->k;
 double* dudx_ptr = device_view->dudx;
 double* dudy_ptr = device_view->dudy;
 double* dvdx_ptr = device_view->dvdx;
 double* dvdy_ptr = device_view->dvdy;
-double* nu_t_ptr = nu_t.data().data();
-double* tau_xx_ptr = tau_ij ? tau_ij->xx_data().data() : nullptr;
-double* tau_xy_ptr = tau_ij ? tau_ij->xy_data().data() : nullptr;
-double* tau_yy_ptr = tau_ij ? tau_ij->yy_data().data() : nullptr;
+
+// IMPORTANT: write results to device-mapped buffers (host pointers are not valid on device)
+double* nu_t_ptr = device_view->nu_t;
+double* tau_xx_ptr = device_view->tau_xx;
+double* tau_xy_ptr = device_view->tau_xy;
+double* tau_yy_ptr = device_view->tau_yy;
 
 gpu_kernels::postprocess_nn_outputs_gpu(
     out_ptr, basis_ptr,
     k_ptr, dudx_ptr, dudy_ptr, dvdx_ptr, dvdy_ptr,
     nu_t_ptr,
     tau_xx_ptr, tau_xy_ptr, tau_yy_ptr,
     n_cells, mlp_.output_dim(),
     nu_
 );
Suggestion importance[1-10]: 9

__

Why: This suggestion correctly identifies a critical bug where unmapped host pointers are used on the GPU, which would lead to crashes or memory corruption.

High
Preserve strain magnitude convention

Revert the change to the S_mag calculation. The new definition is a breaking
change that will silently invalidate existing trained models by altering feature
values.

include/features.hpp [35-38]

-/// Strain rate magnitude |S| = sqrt(S_ij * S_ij) (Frobenius norm)
+/// Strain rate magnitude |S| = sqrt(2 * S_ij * S_ij)
 double S_mag() const {
-    return std::sqrt(Sxx()*Sxx() + Syy()*Syy() + 2.0*Sxy()*Sxy());
+    return std::sqrt(2.0 * (Sxx()*Sxx() + Syy()*Syy() + 2.0*Sxy()*Sxy()));
 }
Suggestion importance[1-10]: 9

__

Why: The suggestion correctly identifies that changing the mathematical definition of the strain rate magnitude S_mag is a breaking change that will invalidate all previously trained models by silently altering feature inputs.

High
Validate device buffers before kernels
Suggestion Impact:The commit added pre-kernel validation of required device_view pointers and added an additional check/error for missing tau buffers when tau_ij is provided, matching the intent of preventing null-pointer GPU crashes. It also adjusted output pointers to use device_view buffers.

code diff:

+    // Validate device_view has all required buffers
+    if (!device_view->u_face || !device_view->v_face ||
+        !device_view->k || !device_view->omega ||
+        !device_view->dudx || !device_view->dudy || !device_view->dvdx || !device_view->dvdy ||
+        !device_view->wall_distance ||
+        !device_view->nu_t) {
+        throw std::runtime_error("NN-TBNN GPU pipeline: device_view missing required buffers");
+    }
+    if (tau_ij && (!device_view->tau_xx || !device_view->tau_xy || !device_view->tau_yy)) {
+        throw std::runtime_error("NN-TBNN GPU pipeline: tau_ij requested but tau buffers not provided in device_view");
+    }

Before launching GPU kernels, add runtime checks to verify that all required
pointers within the device_view struct are non-null to prevent crashes.

src/turbulence_nn_tbnn.cpp [692-707]

 // GPU path: require device_view and gpu_ready (no fallback)
 if (!device_view || !gpu_ready_) {
     throw std::runtime_error("NN-TBNN GPU pipeline requires device_view and GPU buffers initialized");
 }
+if (!device_view->u_face || !device_view->v_face ||
+    !device_view->k || !device_view->omega ||
+    !device_view->dudx || !device_view->dudy || !device_view->dvdx || !device_view->dvdy ||
+    !device_view->wall_distance ||
+    !device_view->nu_t) {
+    throw std::runtime_error("NN-TBNN GPU pipeline: device_view missing required buffers");
+}
+if (tau_ij && (!device_view->tau_xx || !device_view->tau_xy || !device_view->tau_yy)) {
+    throw std::runtime_error("NN-TBNN GPU pipeline: tau_ij requested but tau buffers not provided");
+}
 
 {
     TIMED_SCOPE("nn_tbnn_full_gpu");
-    
+
     // Ensure GPU buffers are allocated
     allocate_gpu_buffers(n_cells);
-    
-    const int total_cells = mesh.total_cells();  // (Nx+2Ng)*(Ny+2Ng) - was incorrectly using Nx twice
+
+    const int total_cells = mesh.total_cells();
     ...

@sbryngelson
Copy link
Member Author

/improve

@qodo-code-review
Copy link

Persistent suggestions updated to latest commit c9f0ab7

@sbryngelson
Copy link
Member Author

/improve

@qodo-code-review
Copy link

Persistent suggestions updated to latest commit ab96b34

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces production-ready trained TBNN models while cleaning up legacy example code and significantly enhancing the test infrastructure. The changes include two fully-trained neural network models (channel flow and periodic hills), comprehensive test coverage for turbulence features, improved GPU/CPU backend handling, and better configuration validation.

Key Changes:

  • Added two trained TBNN models with complete weights, biases, and normalization parameters
  • Fixed parameter order bug in GEP turbulence model initialization
  • Enhanced configuration validation requiring explicit NN model specification
  • Expanded test suite with analytic verification tests and comprehensive turbulence model validation
  • Improved backend flexibility with graceful CPU fallback when GPU unavailable

Reviewed changes

Copilot reviewed 74 out of 75 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
src/turbulence_gep.cpp Fixed parameter order from (nu_, u_ref_, delta_) to (nu_, delta_, u_ref_) in set_reference() calls
src/config.cpp Added validation requiring explicit NN model specification with improved error messages
include/config.hpp Changed default NN paths from "data/" to empty strings
tests/test_turbulence_features.cpp New comprehensive test suite for EARSM, GEP, baseline models with analytic flow verification
tests/test_features.cpp Replaced generic tests with analytic verification for pure shear, strain, and rotation flows
tests/test_perturbed_channel.cpp Updated to use trained tbnn_channel_caseholdout model
tests/test_nn_integration.cpp Updated to use trained tbnn_channel_caseholdout model
tests/test_cpu_gpu_consistency.cpp Refactored for graceful CPU fallback when GPU unavailable
tests/test_backend_execution.cpp Renamed from GPU-specific to backend-agnostic with CPU support
src/turbulence_nn_tbnn.cpp Implemented full GPU pipeline with feature computation and postprocessing
src/turbulence_nn_mlp.cpp Implemented full GPU pipeline with MLP-specific feature computation
src/gpu_kernels.cpp Added GPU kernels for MLP features, TBNN features, and output postprocessing
include/gpu_kernels.hpp Added declarations for new GPU kernel functions
data/models/tbnn_channel_caseholdout/* CRITICAL ISSUE: Invalid normalization statistics with inf/extreme values
data/models/tbnn_phll_caseholdout/* Added trained periodic hills model with valid normalization statistics
scripts/train_realdata_local.sh New comprehensive training script for local TBNN/MLP training
scripts/download_mcconkey_data.sh Improved Kaggle dataset download robustness

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@sbryngelson
Copy link
Member Author

/improve

@qodo-code-review
Copy link

Persistent suggestions updated to latest commit ddc4146

@sbryngelson
Copy link
Member Author

/improve

@qodo-code-review
Copy link

Persistent suggestions updated to latest commit 282ad8d

@sbryngelson sbryngelson merged commit 7dc0496 into master Dec 22, 2025
3 checks passed
@sbryngelson sbryngelson deleted the neuralnets branch December 22, 2025 23:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Development

Successfully merging this pull request may close these issues.

2 participants