Skip to content

Conversation

@Yael-F
Copy link
Contributor

@Yael-F Yael-F commented Sep 2, 2025

This change addresses the confusion between business domain terminology and ML classification metrics by introducing clear separation: security analysis (ISSUE/NON_ISSUE)
ML evaluation (TRUE_POSITIVE/FALSE_POSITIVE) contexts.

@Yael-F Yael-F force-pushed the feature/rename-true-false-positive-to-issue-non-issue branch 2 times, most recently from 783dfb3 to 73be35a Compare September 8, 2025 10:16
@GuyZivRH GuyZivRH force-pushed the feature/rename-true-false-positive-to-issue-non-issue branch from 73be35a to 5a1b344 Compare September 16, 2025 06:42
deploy/README.md Outdated
REPO_REMOTE_URL="https://download.devel.redhat.com/brewroot/vol/rhel-10/packages/systemd/257/9.el10/src/systemd-257-9.el10.src.rpm" \
INPUT_REPORT_FILE_PATH="https://docs.google.com/spreadsheets/d/1NPGmERBsSTdHjQK2vEocQ-PvQlRGGLMds02E_RGF8vY/export?format=csv" \
FALSE_POSITIVES_URL="https://gitlab.cee.redhat.com/osh/known-false-positives/-/raw/master/systemd/ignore.err"
NON_ISSUES_URL="https://gitlab.cee.redhat.com/osh/known-non-issues/-/raw/master/systemd/ignore.err"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is wrong. Repo URL should still be the same.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're absolutely right - the parameter name NON_ISSUES_URL pointing to a URL containing "false-positives" is still confusing and doesn't align with our terminology standardization.

We can add clear documentation in Makefile/README explaining this external URL discrepancy, or propose a new repository structure to the osh team for better alignment with our updated terminology.
Which approach would you prefer?

This change addresses the confusion between business domain terminology and ML classification metrics by introducing clear separation:
security analysis (ISSUE/NON_ISSUE)
ML evaluation (TRUE_POSITIVE/FALSE_POSITIVE) contexts.
@Yael-F Yael-F force-pushed the feature/rename-true-false-positive-to-issue-non-issue branch from 5a1b344 to 8ac46c5 Compare September 18, 2025 09:37
Copy link

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I've carefully reviewed all the code changes and summaries you've provided. Here's a consolidated summary of my findings, focusing on the most critical issues and suggestions to improve the pull request:

Overall Assessment:

The primary objective of these changes is to replace the term "false positive" with "non-issue" to improve clarity and reduce potential misinterpretations. While the intention is good, the execution has introduced some inconsistencies and semantic ambiguities, particularly regarding the interpretation of TRUE_POSITIVE and ISSUE statuses. This is a critical concern that needs to be addressed before merging the changes.

Key Issues and Suggestions:

  1. Inconsistent Interpretation of TRUE_POSITIVE vs. ISSUE (Critical): The most significant issue is the apparent reinterpretation of TRUE_POSITIVE as ISSUE. This is problematic because it blurs the distinction between a confirmed vulnerability (true positive) and a potentially problematic finding that requires further investigation (an issue).

    • Suggestion:
      • Clarify the Intended Meaning: Define the exact meaning of ISSUE and TRUE_POSITIVE (or a renamed equivalent) within the CVEValidationStatus enum and throughout the codebase. Does ISSUE represent a potential problem, or a confirmed one?
      • Revert Changes (if necessary): If ISSUE is not intended to replace the concept of a confirmed vulnerability, revert the changes that replace TRUE_POSITIVE with ISSUE in test cases and default values.
      • Rename (if necessary): If TRUE_POSITIVE is being deprecated, introduce a new name like CONFIRMED_VULNERABILITY or VALID_ISSUE to clearly distinguish it from a potential ISSUE. Update tests to utilize CONFIRMED_VULNERABILITY status if analysis has confirmed an issue.
  2. Incomplete Renaming (Critical): There are instances where the renaming from "false positive" to "non-issue" is incomplete, leading to inconsistencies in variable names, file paths, and configuration options.

    • Suggestion:
      • Thorough Search: Conduct a comprehensive search across the entire project for any remaining occurrences of "false positive" (or related terms like "FP").
      • Update Consistently: Update all occurrences to use the "non-issue" terminology. Pay close attention to variable names, file paths, configuration options, comments, and documentation.
      • Rename Test Files: Rename test files to ensure that they are consistent with the current filter being used. For example, you can rename the file to indicate that now we are filtering KNOWN_NON_ISSUES_FILE_PATH.
  3. Overly Broad Exception Handling (High): Several instances of overly broad except Exception as e: blocks have been identified. This can mask underlying problems and make debugging more difficult.

    • Suggestion:
      • Use Specific Exception Types: Replace except Exception as e: with more specific exception types (e.g., openai.APIError, requests.exceptions.RequestException, ValueError, FileNotFoundError).
      • Log Exception Types: When catching exceptions, log the exception type along with the message to provide more context for debugging.
      • Consider Re-Raising (Carefully): Determine whether it's appropriate to re-raise the exception after logging. In some cases, it might be better to handle the exception by returning a default value or a more informative error message.
  4. Potential Code Duplication (Medium): There are suggestions of potential code duplication in areas like configuration validation and handling similar issues.

    • Suggestion:
      • Refactor into Helper Functions: Refactor duplicated code into reusable helper functions to improve maintainability and reduce the risk of errors.
      • Use Dataclasses for Configuration: Consider using dataclasses to manage configuration variables and provide type checking.
  5. Inconsistent Coding Style (Low): There are inconsistencies in coding style, such as the use of f-strings vs. .format() for logging, and overly verbose lambda functions.

    • Suggestion:
      • Choose a Style and Be Consistent: Select a preferred coding style (e.g., f-strings for logging) and apply it consistently throughout the codebase.
      • Simplify Lambdas: Simplify lambda functions where possible by passing values directly instead of using RunnableLambda.
  6. Missing Documentation/Comments (Medium): Some code changes lack sufficient documentation or comments to explain the rationale behind them.

    • Suggestion:
      • Add Explanatory Comments: Add comments to explain the purpose of code sections, the meaning of variables, and the reasoning behind design decisions.
      • Update Docstrings: Update docstrings to reflect the new terminology and the intended behavior of functions and methods.
  7. Inefficient Metrics Calculation (Low): The initial implementation of metrics calculation involved inefficient list creation.

    • Suggestion: Use the updated code that performs calculations directly from tp, tn, fp, and fn values, avoiding the need to create intermediate lists.

Specific Actions to Take:

  1. Hold the Pull Request: Do not merge this pull request until the issues related to TRUE_POSITIVE and ISSUE are resolved and thoroughly tested.
  2. Update the Commit Message: Provide a clear and detailed commit message explaining the rationale behind all the changes, especially the interpretation of ISSUE and its relationship to TRUE_POSITIVE.
  3. Conduct a Code Review: Perform a thorough code review, paying close attention to the points raised in this summary and the file-specific comments.
  4. Add More Tests: Add more tests to specifically verify the behavior of the code in different scenarios, including cases where issues are identified as non-issues.
  5. Update Documentation: Update any relevant documentation to reflect the new terminology, the intended behavior of the code, and the meaning of CVEValidationStatus values.

By addressing these issues and suggestions, you can significantly improve the quality of the pull request and ensure that the changes are implemented correctly and consistently. This will lead to a more robust, maintainable, and understandable codebase.

Okay, I've reviewed the changes. Here's a summary and some specific comments:

Review Summary:

The main change is renaming "false positives" to "non-issues" throughout the codebase and configuration. This is a good change, as "non-issue" is generally a clearer and more neutral term. The changes look consistent.

File-Specific Comments:

  • README.md:

    • Good clarity improvement, replacing "False Positives" with "(Non-Issues)".
  • config/default_config.yaml:

    • Good, consistent rename.
  • deploy/Makefile:

    • The renaming of FALSE_POSITIVES_URL to NON_ISSUES_URL is good, but also update the comment FALSE_POSITIVES_URL ?= false/positives/url to NON_ISSUES_URL ?= non/issues/url for consistency.
  • deploy/README.md:

    • Good, consistent rename.
  • deploy/tekton/pipeline.yaml:

    • No changes in this diff.

Overall, the changes are straightforward and well-executed. The renaming improves clarity and consistency. Just remember to update the comment in deploy/Makefile.

Okay, I've reviewed the YAML files. Here's a summary and some specific comments:

Review Summary

The primary change is renaming FALSE_POSITIVES_URL to NON_ISSUES_URL and USE_KNOWN_FALSE_POSITIVE_FILE to USE_KNOWN_NON_ISSUES_FILE. This seems like a reasonable change to use more inclusive language. The rest of the changes propagate these renames throughout the Tekton pipeline definitions.

File Comments

  • pipeline.yaml:

    • Good job on consistently updating FALSE_POSITIVES_URL to NON_ISSUES_URL throughout the file.
    • No issues found.
  • deploy/tekton/pipelinerun.yaml:

    • Good job on consistently updating FALSE_POSITIVES_URL to NON_ISSUES_URL throughout the file.
    • No issues found.
  • deploy/tekton/tasks/execute_sast_ai_workflow.yaml:

    • Good job on consistently updating USE_KNOWN_FALSE_POSITIVE_FILE to USE_KNOWN_NON_ISSUES_FILE throughout the file.
    • No issues found.

Overall, the changes seem well-executed and consistent. The renaming is a positive step.

Okay, I've reviewed the changes. Here's a summary of the issues and suggestions:

Review Summary:

The primary change appears to be renaming "false positives" to "non-issues." This is a good change as "non-issues" is more accurate and less prone to misinterpretation. However, the changes are not fully consistent across all files and parameters. There are also opportunities to improve error message clarity.

File-Specific Comments:

  • deploy/tekton/tasks/trivy.yaml:

    • Issue: Inconsistency. Some parameters still use "FALSE_POSITIVE" while others use "NON_ISSUES".
    • Suggestion: Rename KNOWN_FALSE_POSITIVE_FILE_PATH to KNOWN_NON_ISSUES_FILE_PATH and USE_KNOWN_FALSE_POSITIVE_FILE to USE_KNOWN_NON_ISSUES_FILE for consistency.
  • deploy/tekton/tasks/fetch_false_positives.yaml:

    • Suggestion: The error messages could be more descriptive. Instead of just "Error: Could not fetch...", include the URL being fetched in the error message. This will aid in debugging.
  • deploy/tekton/tasks/validate_urls.yaml:

    • Issue: Incomplete change. The script still references FP_URL and prints Validating fal.
    • Suggestion: Change FP_URL to NON_ISSUES_URL and correct the print statement to Validating non-issues URL.

In general, ensure that find-and-replace operations are complete and thorough when renaming variables or concepts across multiple files. Also, think about the debugging experience and make error messages as helpful as possible.

Okay, I've reviewed the changes. Here's a summary and specific comments:

Review Summary:

The primary change is renaming variables and strings related to "false positives" to "non-issues". This appears to be a change in terminology, likely to be more accurate or sensitive. The changes seem consistent across the files.

File Comments:

  • .github/workflows/main.yml

    • Good job on changing FP_URL to NON_ISSUES_URL and updating the echo messages accordingly. The validation logic remains the same, which is appropriate.
  • docs/setup.md

    • Good job consistently updating KNOWN_FALSE_POSITIVE_FILE_PATH to KNOWN_NON_ISSUES_FILE_PATH and updating the description. The surrounding text and explanations are clear.

Overall, the changes are straightforward and well-executed. The renaming is consistently applied, and the comments are updated to reflect the new terminology. No immediate issues are apparent.

Okay, I've reviewed the code differences provided. Here's a summary of the issues and suggestions:

General Review Summary:

  • Clarity: The changes are generally clear.
  • Naming: The most significant issue is the renaming of USE_KNOWN_FALSE_POSITIVE_FILE to USE_KNOWN_NON_ISSUES_FILE. This improves the code's semantic accuracy, as it is a better, more inclusive and less accusatory name.
  • Configuration: Some changes in default config values. Need to ensure that those changes are aligned with expected use-case.
  • Duplication/Complexity: No significant issues.
  • Algorithm Efficiency: No issues.
  • Exception Handling: No issues.

File-Specific Comments:

  • File: (documentation file, name unavailable)

    • Issue: Renaming USE_KNOWN_FALSE_POSITIVE_FILE to USE_KNOWN_NON_ISSUES_FILE is a good change for clarity and avoiding negative connotations. Make sure this change is reflected everywhere in the codebase, not just in the config documentation.
    • Suggestion: Double-check all references to this configuration option in the code to ensure consistency with the new name.
  • File: src/ExcelWriter.py

    • Issue: Introduction of ISSUES_TITLE, NON_ISSUES_TITLE constants.
    • Suggestion: Show the diff where ISSUES_TITLE, NON_ISSUES_TITLE is defined

Overall, the changes look good. The renaming of the config option is a positive step toward more accurate and inclusive language. Just ensure that rename is propagated correctly and that default config value changes align with expected use-case.

Okay, I've reviewed the code differences. Here's a summary of potential issues and suggestions for improvement:

Review Summary:

The changes primarily focus on renaming variables and strings related to "False Positives" to more general terms like "Non-Issues." This seems like a reasonable change to improve clarity and avoid potential misinterpretations. However, there are a few areas where the changes could be improved for better readability, maintainability, and potentially security (depending on the context of these strings).

File-Specific Comments:

File: src/ExcelWriter.py

  • Issue: The changes introduce new variables ISSUES_TITLE and NON_ISSUES_TITLE (presumably defined elsewhere, but not shown in this diff). Without seeing their actual values, it's difficult to assess the full impact. If these titles are user-configurable or derived from external input, there's a potential security risk of injection if not properly sanitized when writing to the Excel sheet.

  • Suggestion: Ensure that ISSUES_TITLE and NON_ISSUES_TITLE are properly sanitized and validated before being used in worksheet.write(). Consider using parameterized queries or escaping special characters to prevent potential injection attacks if the values are dynamic. Add a type hint.

  • Issue: In write_confusion_matrix, "Verified False Positives" and "Verified True Positives" are changed to "Actual False Positives" and "Actual True Positives" respectively. This change in wording should be consistent with what these values represent.

  • Suggestion: Make sure the change in naming from "Verified" to "Actual" is correct and consistent with the rest of the system.

File: src/FilterKnownIssues.py

  • Issue: The change from KNOWN_FALSE_POSITIVE_TEMPLATES to KNOWN_NON_ISSUES_TEMPLATES is consistent with the overall renaming.
  • Suggestion: Ensure that KNOWN_NON_ISSUES_TEMPLATES is used in the same context as KNOWN_FALSE_POSITIVE_TEMPLATES was previously, and that the underlying data structure and usage are still valid.

Overall, the changes seem reasonable, but it's important to consider the security implications of using dynamic strings in ExcelWriter.py and to ensure consistency in the naming changes.

Okay, I've reviewed the code diff. Here's a summary of the issues and suggestions:

Review Summary:

The code introduces a mechanism to identify and handle known non-issue cases (previously referred to as false positives). The changes are generally good, but there are opportunities for clarification, simplification, and better naming. The code also includes a validation with safe_validate that's good for security.

File-Specific Comments:

  • File: (The file content you provided)

    • Clarity: The comment "#try to filter by cwe too" in KnownIssueRetriever.get_relevant_known_issues suggests an intended feature. Either implement it or remove the comment.
    • Naming: Rename is_finding_known_false_positive to is_finding_known_non_issue for consistency. Also, KnownFalsePositive to KnownNonIssue. This avoids double negatives.
    • Naming: Instead of error_trace, should we call it finding_trace?
    • Docstrings: Update docstrings to reflect that we are looking for non-issues instead of false positives.
    • Efficiency: The code iterates through docs to create known_issues. If docs is very large, consider using a generator expression or list comprehension for efficiency and readability.
    • Duplication: The doc.metadata.get(...) calls are repeated. Consider creating a helper function to extract metadata, improving readability and maintainability.
    • Return Types: It returns two dictionaries, consider creating a class for it.

Okay, I've reviewed the code changes. Here's a summary and specific comments:

Review Summary:

The changes focus on renaming "false positives" to "non-issues" throughout the code, which is a good clarification. The code seems straightforward, and I don't see any immediate performance bottlenecks. The use of functions and constants for templates is good for maintainability.

File-Specific Comments:

  • known_issue_manager.py
    • Good job on renaming variables and functions to reflect the "non-issue" terminology. This improves clarity.
    • Consider adding a comment explaining why the threshold is set to a specific value in create_known_issue_retriever.
    • convert_similar_issues_to_examples_context_string: Add type hints to this function to know what type the return variable has.
    • In is_known_non_issue, there's repeated calls to lower and strip. This can be done outside of the if check.
    • Consider renaming function name is_known_false_positive to is_known_non_issue to be consistent with the rest of the file.

Overall, the changes look good. The renaming improves clarity, and the code is well-structured. Addressing the minor comments above will further enhance the code quality.

Okay, I've reviewed the code differences. Here's a summary of potential issues and suggestions:

Review Summary:

  • Clarity: The code is generally clear, but the purpose of some functions and variables could benefit from more descriptive names or comments.
  • Duplication: There's some potential duplication in how similar issues are handled.
  • Error Handling: The config_utils.py file includes basic validation, which is good.
  • Security: No immediate security concerns are apparent in these snippets.

File-Specific Comments:

File: src/Utils/ContextBuilders.py

  • Issue: The function name convert_similar_issues_to_examples_context_string is repeated.
  • Suggestion: Remove duplication

File: src/Utils/config_utils.py

  • Suggestion: Add a default value to config.get() calls to avoid potential None values causing errors later on. For example, in the validate_configurations function, use config.get("USE_KNOWN_NON_ISSUES_FILE", False) if False is a reasonable default. This makes the code more robust.
  • Suggestion: Consider using a dedicated configuration class (e.g., using dataclasses) to manage configuration variables and provide type checking. This can improve code maintainability and reduce errors related to misconfiguration.

I'm ready for the next set of code differences.

Okay, I've reviewed the changes. Here's a summary of the issues and suggestions:

General Review Summary:

The changes primarily focus on configuration validation, file handling, and metrics calculation. There are improvements in clarity and robustness, particularly in configuration validation. However, there are opportunities to improve naming consistency and potentially simplify some logic.

File-Specific Comments:

1. src/common/config.py

  • Issue: The code validates environment variables and configuration settings, which is good. However, repeating similar validation logic for different configuration variables can become cumbersome.

  • Suggestion: Create a helper function to encapsulate the validation logic. This reduces code duplication and improves readability. For example:

    def get_config_value(key, required=True, type=str):
        value = config.get(key)
        if value is None and required:
            raise ValueError(f"Configuration variable '{key}' is required.")
        return type(value) if value is not None else None
    
    def get_env_variable(key, required=True):
        value = os.environ.get(key)
        if not value and required:
            raise ValueError(f"Environment variable '{key}' is not set.")
        return value
    
    # Usage:
    # llm_api_key = get_env_variable("LLM_API_KEY")
    # if config.get("RUN_WITH_CRITIQUE"):
    #    critique_model = get_config_value("CRITIQUE_LLM_MODEL_NAME")

2. src/Utils/file_utils.py

  • Issue: The renaming of KNOWN_FALSE_POSITIVE_ISSUE_SEPARATOR to KNOWN_NON_ISSUES_SEPARATOR is a good semantic improvement.
  • Suggestion: No further immediate suggestions.

3. src/Utils/metrics_utils.py

  • Issue: There's a potential inconsistency in terminology. The code uses both "true positive" and "issue" to represent the same concept (a correctly identified vulnerability).
  • Suggestion: Standardize the terminology. Given the context of security scanning, using "issue" and "non-issue" consistently might be clearer. Rename is_true_positive to is_issue. Consider renaming positives and negatives to issues and non_issues for better readability.
  • Suggestion: Consider adding type hints to function parameters and return values for improved code clarity and maintainability.
  • Suggestion: The variables KNOWN_ISSUES_SHORT_JUSTIFICATION and KNOWN_FALSE_POSITIVES were renamed to KNOWN_NON_ISSUES_SHORT_JUSTIFICATION and KNOWN_NON_ISSUES. Ensure these changes propagate correctly throughout the application, and that these constants actually contain the appropriate values/data.

In summary, the PR demonstrates improvements in code clarity and robustness. Addressing the minor suggestions above will further enhance the code quality.

Okay, I've reviewed the code changes and here's a summary of potential issues and suggestions:

General Review Summary:

  • Clarity: The code is generally clear, but some variable names could be improved for better readability (e.g., actual_fp, predicted_fp).
  • Efficiency: The logic in the get_metrics function can be simplified. It creates lists of 0s and 1s, which is not efficient. It's calculating metrics from TP, TN, FP, FN counts, not from the underlying data, so it doesn't need y_true or y_pred.
  • Naming Consistency: The function categorize_issues_by_status contains variables named known_false_positives which conflicts with known_non_issues and true_positives which conflicts with issues.
  • Error Handling: The get_numeric_value function could benefit from more specific exception handling.
  • Readability: In categorize_issues_by_status, the long lines assigning multiple counter values can be made more readable.
  • Duplication: There are similarities between calculate_confusion_matrix and get_metrics that could be reduced.

File-Specific Comments:

  • File: [This would be the name of the file containing the code]

    • Issue: Inefficient list creation in get_metrics.

      • Suggestion: Remove y_true and y_pred list creation. The scoring functions accept tp, tn, fp, and fn directly. Calculate metrics using formulas based on these values (e.g., accuracy = (tp + tn) / (tp + tn + fp + fn)). This will be significantly faster and more memory-efficient.
      • Code Snippet:
        def get_metrics(tp, tn, fp, fn):
            total = tp + tn + fp + fn
            if total == 0:
                return 0.0, 0.0, 0.0, 0.0
        
            accuracy = (tp + tn) / total
            recall = tp / (tp + fn) if (tp + fn) > 0 else 0.0
            precision = tp / (tp + fp) if (tp + fp) > 0 else 0.0
            f1 = 2 * (precision * recall) / (precision + recall) if (precision + recall) > 0 else 0.0
        
            return accuracy, recall, precision, f1
    • Issue: Inconsistent naming in categorize_issues_by_status

      • Suggestion: Make variable names consistent and align with their correct meanings
      • Code Snippet:
        # Count known non-issues
        if KNOWN_ISSUES_SHORT_JUSTIFICATION in issue_data.analysis_response.justification:
            counter[KNOWN_NON_ISSUES] += 1
        elif issue_data.analysis_response.status == Status.ISSUE.value:
            counter[ISSUES] += 1
        elif issue_data.analysis_response.status == Status.NON_ISSUE.value:
            counter[NON_ISSUES] += 1
    • Issue: Long lines in categorize_issues_by_status.

      • Suggestion: Break the long lines for better readability.
      • Code Snippet:
        (
            counter[NO_ANALYSIS_RESPONSE],
            counter[NON_FINAL_ISSUES],
            counter[FINAL_ISSUES],
            counter[KNOWN_NON_ISSUES],
            counter[ISSUES],
            counter[NON_ISSUES],
            counter[NEEDS_SECOND_ANALYSIS],
        ) = (0, 0, 0, 0, 0, 0, 0)
    • Issue: Non-specific exception in get_numeric_value.

      • Suggestion: Catch ValueError specifically.
      • Code Snippet:
        try:
            return int(value)
        except ValueError:
            return 0

Let me know when you have the next file to review!

Okay, here's a review of the changes you've provided, focusing on clarity, simplicity, efficiency, and security best practices.

General Review Summary:

The changes seem to be focused on improving the accuracy and clarity of issue classification, specifically around "false positives" (now "non-issues"). There's also a config change to reflect this renaming. The code appears generally safe and doesn't introduce obvious security vulnerabilities.

File-Specific Comments:

File: src/analyzer/analysis.py

  • Issue: The original code used TRUE_POSITIVES and FALSE_POSITIVES. The new code replaces these with ISSUES and NON_ISSUES. The intention is good but renaming like this can be confusing. Are TRUE_POSITIVES now ISSUES? And FALSE_POSITIVES now NON_ISSUES?
  • Suggestion: Add comments to explain the mapping between the old and new naming conventions, or consider more descriptive names that avoid confusion. If TRUE_POSITIVES are literally all issues and FALSE_POSITIVES are all non-issues, then the change is fine.
  • Suggestion: Consider using an enum for the counter keys instead of strings to avoid typos and improve readability.

File: src/common/config.py

  • Issue: Renaming KNOWN_FALSE_POSITIVE_FILE_PATH to KNOWN_NON_ISSUES_FILE_PATH and USE_KNOWN_FALSE_POSITIVE_FILE to USE_KNOWN_NON_ISSUES_FILE is a good change for clarity.
  • Suggestion: Ensure that any documentation or configuration examples are also updated to reflect these new names.
  • Suggestion: When validating SIMILARITY_ERROR_THRESHOLD and MAX_ANALYSIS_ITERATIONS consider creating a reusable function to avoid repeating similar validation logic.

Overall, these changes appear to be a good step toward improving the accuracy and clarity of the system. Just be mindful of the naming consistency and documentation updates.

--- a/src/common/constants.py
+++ b/src/common/constants.py
@@ -18,9 +18,9 @@
 OUTPUT_FILE_PATH = "OUTPUT_FILE_PATH"
 AGGREGATE_RESULTS_G_SHEET = "AGGREGATE_RESULTS_G_SHEET"
 INPUT_REPORT_FILE_PATH = "INPUT_REPORT_FILE_PATH"
--KNOWN_FALSE_POSITIVE_FILE_PATH = "KNOWN_FALSE_POSITIVE_FILE_PATH"
+-KNOWN_NON_ISSUES_FILE_PATH = "KNOWN_NON_ISSUES_FILE_PATH"
 HUMAN_VERIFIED_FILE_PATH = "HUMAN_VERIFIED_FILE_PATH"
--USE_KNOWN_FALSE_POSITIVE_FILE = "USE_KNOWN_FALSE_POSITIVE_FILE"
+-USE_KNOWN_NON_ISSUES_FILE = "USE_KNOWN_NON_ISSUES_FILE"
 CALCULATE_RAGAS_METRICS = "CALCULATE_RAGAS_METRICS"
 TOKENIZERS_PARALLELISM = "TOKENIZERS_PARALLELISM"
 RUN_WITH_CRITIQUE = "RUN_WITH_CRITIQUE"
@@ -36,31 +36,29 @@
     "Please check this Issue-id {issue_id}."
     "\nError: {error}"
 )
--FALLBACK_JUSTIFICATION_MESSAGE = [
-    "Failed during analyze process. Defaulting to: NOT A FALSE POSITIVE."
-]
+-FALLBACK_JUSTIFICATION_MESSAGE = ["Failed during analyze process. Defaulting to: ISSUE."]
  YES_OPTIONS = ["y", "yes"]
  NO_OPTIONS = ["n", "no"]
  ALL_VALID_OPTIONS = YES_OPTIONS + NO_OPTIONS
--KNOWN_FALSE_POSITIVE_ISSUE_SEPARATOR = "\n\n"
+-KNOWN_NON_ISSUES_SEPARATOR = "\n\n"
  
  # Template and formatting constants
--KNOWN_FALSE_POSITIVE_TEMPLATES = {
-+KNOWN_NON_ISSUES_TEMPLATES = {
+     "EXAMPLE_MULTILINE_TEMPLATE": (
+         "** Example-{number} **\n"
+         "(Example-{number}) Known Non-Issue:\n"
+         "Error {issue_type} ({issue_cwe}):\n"
+         "{error_trace}\n"
+         "(Example-{number}) Reason Marked as Non-Issue:\n"
+         "{reason}\n\n"
+     ),
+     "FILTER_CONTEXT_TEMPLATE": (
+         "Known Non-Issue {index}:\n"
+         "non_issue_error_trace:\n"
+         "{error_trace}\n"
+         "reason_marked_non_issue:\n"
+         "{reason}\n\n"
+     ),
+-    "EXAMPLE_MULTILINE_TEMPLATE": (
 -        "** Example-{number} **\n"
--        "(Example-{number}) Known False Positive:\n"
-        "Error {issue_type} ({issue_cwe}):\n"
-        "{error_trace}\n"
--        "(Example-{number}) Reason Marked as False Positive:\n"
-        "{reason}\n\n"
-    ),
-    "FILTER_CONTEXT_TEMPLATE": (
--        "Known False Positive {index}:\n"
--        "false_positive_error_trace:\n"
-        "{error_trace}\n"
--        "reason_marked_false_positive:\n"
-        "{reason}\n\n"
-    ),
  }
  REGEX_PATTERNS = {"CWE_PATTERN": r"CWE-\d+", "CODE_BLOCK_LINE_PATTERN": r"#\s*\d+\|"}
  
@@ -71,7 +69,7 @@
 +FALSE = "FALSE"
 +TRUE = "TRUE"
  VALIDATION_LIMITS = {
-    "MIN_SIMILARITY_THRESHOLD": 1, "MAX_SIMILARITY_THRESHOLD": 10, 
+     "MIN_SIMILARITY_THRESHOLD": 1,
+     "MAX_SIMILARITY_THRESHOLD": 10,
      "MIN_ANALYSIS_ITERATIONS": 1,
  }
--KNOWN_ISSUES_SHORT_JUSTIFICATION = "The error is similar to one found in the provided known issues (Details in the full Justification)"
-+KNOWN_NON_ISSUES_SHORT_JUSTIFICATION = "The error is similar to one found in the provided known non-issues (Details in the full Justification)"
  NO_MATCHING_TRACE_FOUND = "No matching trace found"
  
  # Pre-process constants

Review Summary:

The primary change in this diff is renaming "False Positive" to "Non-Issue" throughout the constants.py file. This seems like a reasonable semantic change. The code changes themselves are straightforward find and replace operations.

File Comments:

  • src/common/constants.py:
    • Consider whether FALLBACK_JUSTIFICATION_MESSAGE should be configurable. It is hardcoded to ISSUE.
    • The addition of FALSE and TRUE constants might be better handled using bool type directly in the code where they are used. Evaluate the benefits of these string constants vs. boolean values.
    • Consistency: Ensure the term "Non-Issue" is consistently used throughout the codebase, not just in this file. Search the whole project.

Okay, I've reviewed the code differences you provided. Here's a summary of the issues and suggestions for improvement:

Review Summary:

The changes mainly focus on standardizing terminology and adding clarity around issue categorization. The introduction of ISSUE and NON_ISSUE constants and renaming variables to reflect this is a good step. However, some areas can be further improved for consistency and clarity.

File-Specific Comments:

File: src/common/constants.py

  • Clarity: Consider renaming KNOWN_FALSE_POSITIVES to KNOWN_NON_ISSUES for consistency with the new NON_ISSUE terminology.
  • Consistency: Introduce ISSUES and NON_ISSUES to supersede TRUE_POSITIVES and FALSE_POSITIVES.
  • Consider Enum: Instead of separate NON_ISSUE and ISSUE string constants, consider defining them in an enum if they're related to CVEValidationStatus.

File: src/dto/LLMResponse.py

  • Consistency: Rename TRUE_POSITIVE and FALSE_POSITIVE to ISSUE and NON_ISSUE respectively in the CVEValidationStatus enum.
  • Method Naming: Rename is_true_positive() to is_issue() for clarity and consistency.

File: src/dto/ResponseStructures.py

  • No specific issues found.

Let me know if you'd like a more detailed explanation of any of these points.

Okay, I've reviewed the changes. Here's a summary and some specific comments:

Review Summary:

The changes primarily involve renaming "false positive" to "non-issue" in the context of known issues. This seems to be a semantic change to better reflect the nature of these known issues. The code changes are straightforward and consistent with the renaming. The data model and constants were modified as well.

File-Specific Comments:

  • File: dto/FilterResponse.py
    • No issues found. The change is a simple renaming and the code is clear.
  • File: src/run.py
    • Consider renaming already_seen_issues_dict and similar_known_issues_dict to already_seen_non_issues_dict and similar_known_non_issues_dict respectively for clarity and consistency with the rest of the changes.

Overall, the changes seem reasonable and well-implemented. Renaming the variables will improve readability.

Okay, I've reviewed the changes. Here's a summary of the potential issues and suggestions:

Review Summary:

The changes primarily revolve around renaming "false positives" to "non-issues" and adjusting the logic and configurations accordingly. There are also some refactoring of imports. The core functionality of filtering and identifying known issues remains the same.

File-Specific Comments:

  • src/sast_agent_workflow/agent.py:

    • Issue: Renaming KNOWN_ISSUES_SHORT_JUSTIFICATION to KNOWN_NON_ISSUES_SHORT_JUSTIFICATION and CVEValidationStatus.TRUE_POSITIVE.value to CVEValidationStatus.ISSUE.value seems correct, aligning with the change in terminology. However, ensure that CVEValidationStatus.ISSUE is the correct intended status for items now classified as "non-issues". Double-check the implications of this status change in downstream processing.
    • Suggestion: Add a comment explaining why CVEValidationStatus.ISSUE is being used for what were previously classified as TRUE_POSITIVE.
  • src/sast_agent_workflow/configs/config.yml:

    • The change in description is straightforward and reflects the renaming. No issues here.
  • src/sast_agent_workflow/tools/filter.py:

    • The renaming of functions like is_known_false_positive to is_known_non_issue is appropriate.
    • The import statement reordering is fine.

Overall, the changes seem logically consistent with the goal of renaming "false positives" to "non-issues." However, ensure that the status code changes have been validated against the expected behavior.

Okay, I've reviewed the code diff. Here's a summary and some specific comments:

Review Summary:

The changes rename "false positive" terminology to "non-issue" which appears to be the core intent. The code is generally well-structured and readable. There are improvements in clarity and consistency.

File: sdv/core/verifiers/verifier.py

  • Clarity: The renaming of variables and functions from false_positive to non-issue improves clarity and aligns with the intended meaning.
  • Consistency: The update of CVEValidationStatus and justification messages ensures consistency across the module.
  • Duplication: The code has reduced duplication by creating functions.
  • Naming:
    • is_known_false_positive and _create_false_positive_response are renamed to is_known_non_issue and _create_non_issue_response. This is good.

I don't see any major issues that would prevent merging this PR. The changes improve the clarity and maintainability of the code.

Okay, I've reviewed the code differences and here are my observations:

Review Summary:

The primary change seems to be renaming "false positive" to "non-issue," which is good for clarity and avoiding negative connotations. There are also some minor logging and documentation updates to reflect this change. The code seems straightforward, and I don't see any obvious performance issues or security vulnerabilities. However, I have a few suggestions for improvement.

File Comments:

File: sast_filter.py

  • Clarity: The change from "false positive" to "non-issue" is good. It's clearer and less loaded.
  • Consistency: Ensure this renaming is applied consistently throughout the codebase, not just this file. Search for other instances of "false positive" to see if they also need updating.
  • Error Message: In the except block, consider adding more context to the error message. For example, include the specific issue ID that caused the error.
  • Config Variable Name: The config variable USE_KNOWN_FALSE_POSITIVE_FILE should be renamed to USE_KNOWN_NON_ISSUES_FILE for consistency.
  • Logging: Consider using more structured logging (e.g., with key-value pairs) instead of just string formatting. This will make it easier to analyze logs programmatically.

Okay, I've reviewed the code differences you've provided. Here's a summary of potential issues and suggestions for improvement, focusing on clarity, simplicity, and efficiency:

General Comments

  • Clarity on Statuses: The change from KNOWN_FALSE_POSITIVES to KNOWN_NON_ISSUES is good for clarity. Make sure this change is reflected consistently throughout the codebase.
  • Error Handling: The try...except block for creating the known_issue_retriever is good. Consider logging the exception type as well for more detailed debugging.
  • Default Values: The use of DEFAULT_FIELD_VALUE is a bit opaque. Consider making this a more descriptive constant (e.g., DEFAULT_JUSTIFICATION or NO_RECOMMENDATION_AVAILABLE)

File-Specific Comments

  • File: src/sast_agent_workflow/tools/filter.py
    • Issue: The return of tracker inside the except block might lead to unexpected behavior if the known_issue_retriever fails to initialize. The filter will be skipped entirely and the original tracker is returned. Is this the intended behavior? Consider re-raising the exception or returning a tracker object that signifies that filtering has not been performed.
    • Suggestion: Add a comment explaining why the tracker is returned in the exception block, and ensure this behavior aligns with the overall workflow.
  • File: src/sast_agent_workflow/tools/pre_process.py
    • Issue: The initial value in AnalysisResponse was changed from CVEValidationStatus.TRUE_POSITIVE.value to CVEValidationStatus.ISSUE.value.
    • Suggestion: Make sure this change is intentional and will not affect the workflow.

I'm ready for the next set of code differences.

Okay, I've reviewed the code differences you provided. Here's a summary of potential issues and suggestions for improvement, focusing on clarity, efficiency, and robustness:

General Review Summary:

The changes seem to focus on the pre_process function, which initializes the SAST workflow. The updates include:

  • Adding a description to the function registration.
  • Initializing a Config object within the _pre_process_fn.
  • Creating a dictionary issues_dict to store PerIssueData.
  • Initializing a repository handler.
  • Improved logging.

File-Specific Comments:

File: src/functions/pre_process.py

  • Clarity: The added descriptions are good.
  • Error Handling: The try...except blocks are good for catching exceptions during config loading and repo initialization. However, consider logging the stack trace along with the error message to provide more context for debugging.
  • Efficiency: Creating issues_dict is fine if the number of issues is relatively small. For a very large number of issues, consider if a generator could be used to process them in chunks instead of loading everything into memory at once. However, this adds complexity, so only consider it if memory becomes a bottleneck.
  • Config Instantiation: Instantiating the Config object inside the _pre_process_fn seems a bit odd. Is there a reason it's not instantiated at the module level or passed in? If it's intended to be re-initialized every time the function runs, that's fine, but it's worth clarifying.
  • Resource Cleanup: The finally block only logs a message. Are there any actual resources that need to be cleaned up (e.g., closing file handles, releasing network connections)? If not, the finally block is unnecessary.

Improved Code Suggestion for src/functions/pre_process.py

@register_function(config_type=PreProcessConfig)
async def pre_process(config: PreProcessConfig, builder: Builder):
    """Register the Pre_Process function."""

    logger.info("Initializing Pre_Process function...")

    async def _pre_process_fn(empty_input: dict) -> SASTWorkflowTracker:
        """
        Pre-processing function for SAST workflow.
        """
        logger.info("Running Pre_Process node - initializing workflow")

        try:
            config = Config() # Consider moving this outside if it doesn't need re-initialization
        except Exception as e:
            logger.exception("Failed to read SAST config") #Logs the stack trace
            raise RuntimeError(f"Failed to read SAST config: {e}") from e

        issue_list = config.get_issues()
        issues_dict = {}
        for issue in issue_list:
            issues_dict[issue.id] = PerIssueData(
                issue=issue,
                source_code={},
                similar_known_issues="",
                analysis_response=_create_default_analysis_response(),
                found_symbols=set(),
            )
        tracker = SASTWorkflowTracker(
            issues=issues_dict, config=config, iteration_count=0, metrics={}
        )
        try:
            # Initialize the repo handler, just to download the repo if needed
            repo_handler = builder.repo_handler  # Assuming builder has repo_handler
            await repo_handler.initialize()
        except Exception as e:
            logger.exception("Failed to initialize repository handler")
            raise RuntimeError(f"Repository handler initialization failed: {e}") from e

        logger.info("Pre_Process node completed")
        return tracker

    try:
        yield FunctionInfo.create(
            single_fn=_pre_process_fn,
            description=config.description,
            input_schema=None,  # No input for pre_process
        )
    except GeneratorExit:
        logger.info("Pre_Process function exited early!")
    finally:
        logger.info("Cleaning up Pre_Process function.") # Remove if no actual cleanup is needed

File: src/services/issue_analysis_service.py

  • No comments, since only an incomplete line was provided. Please provide the complete diff for this file.

Let me know if you have more code to review!

Okay, I've reviewed the code changes. Here's a summary of my findings and suggestions, focusing on clarity, efficiency, and best practices:

Review Summary:

The changes improve the structure and clarity of the IssueAnalysisService, particularly in how it interacts with the LLM. The use of langchain_core components for prompt construction is a positive step. However, there are opportunities to further refine the code for better readability and maintainability.

File-Specific Comments:

Based on the provided diff, here are some specific comments for the file:

  • File: (The file name isn't provided, but assuming it's something like issue_analysis_service.py)

    • Clarity and Consistency: The code has been improved with better structure and formatting.
    • Docstrings: Docstrings are generally good, but could be slightly more descriptive in some places (e.g., specify the data type of arguments and return values).
    • Error Handling: The @retry decorator is good, but consider adding logging within the retried function to provide more insight into why retries are happening. This is especially helpful for debugging intermittent LLM issues.
    • "False Positive" vs "Non-Issue": The change from "false positive" to "non-issue" is good for accuracy. Ensure this terminology is consistently used throughout the codebase.
    • Template Path: The template_path construction is fine, but consider making it configurable via the config object passed to the class. This would make it easier to deploy the code in different environments.
    • filter_known_issues_from_context function: Instead of directly returning from the middle of the function, consider assigning the response variable and then having one single return statement at the end. This enhances readability of the code.

I will wait for the next diffs.

Okay, I've reviewed the code differences. Here's a summary of my observations and suggestions:

Review Summary:

The changes primarily focus on improving the structure and robustness of the IssueAnalysisService, particularly the filter_known_error and analyze_issue methods. The changes include:

  • Improved prompt formatting for clarity.
  • Enhanced error handling with more informative logging.
  • Type hinting for better code readability.

File: backend/src/grapycal_coder/issue_analysis/issue_analysis.py

  • Issue: The comment says "Should not use 'system' for deepseek-r1". If there's a specific reason for this, it should be explained in more detail. Is this a DeepSeek specific requirement/bug?

    • Suggestion: Add a more detailed comment explaining the reason for not using "system" with deepseek-r1. For example: # DeepSeek models have an issue with the 'system' role. Using it can lead to unpredictable behavior, so we avoid it here.
  • Issue: The error message in the except block of filter_known_error is good, but consider adding the exception type to the log for easier debugging.

    • Suggestion: Modify the error logging to include the exception type:
      logger.error(
          RED_ERROR_FOR_LLM_REQUEST.format(
              max_retry_limit=self.max_retry_limit,
              function_name="filter_known_error",
              issue_id=issue.id,
              error=f"{type(e).__name__}: {e}", # ADDITION HERE
          )
      )
  • Suggestion: The docstring for analyze_issue says "false positive", but the function itself says "non-issue". I suggest you align these terms.

Okay, I've reviewed the changes. Here's a summary of my observations:

Overall Review Summary:

The code changes seem to be primarily focused on improving error handling and providing fallback mechanisms within the IssueAnalysisService. The addition of comprehensive error handling is a good practice.

File: IssueAnalysisService.py

  • Issue: The original code had a general except Exception as e: block.
    • Improvement: The code now sets default values for the fields if any exception occurs which is a good pattern.
  • Issue: The default value assignment uses ternary if/else which can reduce readability
    • Suggestion: simplify it with a single if statement
try:
    # ... your code ...
except Exception as e:
    failed_message = "Failed during analyze process"
    logger.error(f"{failed_message}, set default values ... Error is: {e}")
    llm_analysis_response = AnalysisResponse(...) # with fallback values

    if analysis_response is None:
        llm_analysis_response.investigation_result = CVEValidationStatus.ISSUE.value
        llm_analysis_response.justifications = FALLBACK_JUSTIFICATION_MESSAGE
    if recommendations_response is None:
        llm_analysis_response.is_final = FinalStatus.TRUE.value
        llm_analysis_response.evaluation = [failed_message]
        llm_analysis_response.recommendations = [failed_message]
        llm_analysis_response.instructions = []

Okay, I've reviewed the code changes. Here's a summary of potential issues and suggestions:

Review Summary:

The code appears to be focused on improving the robustness and clarity of the analysis process, particularly within the context of evaluating security issues. The changes include better error handling, retries, and clearer messaging. However, there's an opportunity to further simplify some of the conditional logic and potentially improve the prompt construction.

File-Specific Comments:

Let's assume the file name is analyzer.py

  • File: analyzer.py
    • Issue: The nested if/else statements for assigning recommendations_response.instructions and short_justifications_response.short_justifications can be a bit hard to read.

    • Suggestion: Use the or operator for a more concise way to handle default values, it will be cleaner

      recommendations = recommendations_response.instructions if recommendations_response else None
      short_justifications = short_justifications_response.short_justifications if short_justifications_response else f"{failed_message}. Please check the full justifications."
    • Issue: The retry decorator is good, but consider logging the exceptions being retried for better debugging.

    • Suggestion: Add a logger.warning inside the retry decorator

    • Issue: Changing false positive to non-issue improves clarity.

    • Suggestion: Ensure that the change to non-issue is applied consistently throughout the codebase. A simple search within the project can verify this.

These suggestions aim to improve the readability, maintainability, and debuggability of the code.

Okay, I've reviewed the code changes. Here's a summary of potential issues and suggestions:

Overall Review Summary:

The code introduces an analysis_prompt_chain to streamline the prompt construction for vulnerability analysis. The changes include prompt templates, LLM response parsing, and error handling. The use of RunnableLambda and RunnablePassthrough makes the prompt construction more modular. However, there are some areas where clarity, efficiency, and error handling could be improved.

File-Specific Comments:

Based on the snippet provided, I can't pinpoint the exact filenames, but I'll assume the code relates to IssueAnalysisService.py or a similar class.

  • File: IssueAnalysisService.py (Assumed)
    • Issue: The analysis_prompt_chain is created but the actual_prompt isn't used, it's only logged. This seems wasteful.
      • Suggestion: Use the analysis_prompt_chain result directly in the LLM call, rather than invoking and discarding its output.
    • Issue: Broad Exception handling in _analyze method.
      • Suggestion: Use more specific exception handling. For example, catch openai.APIError or requests.Timeout to retry network-related issues. Also, consider logging the exception type along with the message for better debugging.
    • Issue: The retry decorator in summarize_justification retries on any exception. This is too broad.
      • Suggestion: Retry only on specific exceptions that are likely to be transient, such as network errors or rate limits. Use retry_if_exception_type with a tuple of specific exception types.
    • Issue: Long string for examples_str in summarize_justification.
      • Suggestion: Move examples_str to the config or as a constant variable to make it more readable.
    • Issue: The actual_prompt.to_string() is not a standard way to print a prompt for debugging.
      • Suggestion: Use actual_prompt.pretty_print() if available or actual_prompt.format() to display the prompt in a readable format.
    • Issue: Inconsistent use of f-strings vs. .format() for logging.
      • Suggestion: Choose one style (preferably f-strings for readability) and use it consistently throughout the code.

Security Considerations (General):

  • Prompt Injection: Be extremely careful about user-provided input that is directly inserted into prompts. Sanitize or validate the user_input to prevent prompt injection attacks. Consider using techniques like prompt hardening or input validation to mitigate this risk.
  • Rate Limiting/Abuse: If this service is exposed to external users, implement robust rate limiting to prevent abuse and ensure fair usage of the LLM API.
  • Data Sensitivity: Ensure that sensitive data (e.g., API keys, user data) is not inadvertently included in prompts or logged.

I will provide more specific comments if you provide the full code of relevant files.

Okay, here's a review of the code changes, focusing on clarity, simplicity, efficiency, and error handling:

Review Summary:

The changes seem to be focused on improving the structure and readability of the prompt chains used for justification summarization and recommendation generation. The use of ChatPromptTemplate.from_messages is good for clarity. The use of RunnableLambda and RunnablePassthrough within the prompt chains seems a bit verbose. The error handling includes logging, which is good, but re-raising the exception might not be the best strategy in all cases. The retry logic is appropriate.

File Comments:

File: [Provide the actual file name where the changes are made]
  • Clarity: The refactoring to define prompt using ChatPromptTemplate.from_messages significantly improves readability. Good job!

  • Simplicity: Consider if the RunnableLambda calls are truly necessary. If the values being passed are already available in the scope where the prompt chain is defined, you might be able to pass them directly without the lambda.

  • Error Handling: The try...except block in summarize_justification re-raises the exception after logging. While logging is good, re-raising might interrupt the overall process. Consider whether you should handle the exception (e.g., by returning a default value or a more informative error message) instead of just propagating it. This depends on the desired behavior of the calling function.

  • Security: No immediate security concerns are apparent in this snippet. However, always be mindful of prompt injection vulnerabilities when constructing prompts from user-provided data (which doesn't seem to be the case here, but is a general principle).

Okay, I've reviewed the code diff. Here's a summary of potential issues and suggestions:

Review Summary:

The changes primarily involve error handling and retry mechanisms for LLM calls, which is good for robustness. The logging is improved, and the code is more readable. However, there's a potential issue with broad exception handling, which could mask underlying problems.

File-Specific Comments:

  • src/automatik/actions/analyze.py:
    • Issue: The except Exception as e: blocks in both recommend and _evaluate methods are too broad. This catches all exceptions, including those you might not want to handle (e.g., KeyboardInterrupt, SystemExit).
    • Suggestion: Use more specific exception types. For example, catch openai.APIError, requests.exceptions.RequestException, or a custom exception type that you define for LLM-related errors. This allows you to handle expected errors gracefully while still allowing unexpected errors to propagate.
    • Suggestion: Consider adding a finally block to ensure resources are cleaned up, regardless of whether an exception occurred.
    • Suggestion: The retry decorator is used for the _evaluate function. It would be helpful to add this to the recommend function as well to handle transient errors.
  • src/templates/prompts/analysis_system_prompt.yaml:
    • No issues found.

Overall, the changes improve the robustness of the code. Just be sure to refine the exception handling to be more specific.

Okay, I've reviewed the changes to analysis_system_prompt.yaml. Here's a summary and specific comments:

Review Summary:

The primary change is renaming "FALSE POSITIVE" and "TRUE POSITIVE" to "NON_ISSUE" and "ISSUE" respectively. The prompt is also updated to reflect the change. The change appears to be aimed at improving clarity and reducing potential for misinterpretation of what constitutes a positive or negative result.

File: analysis_system_prompt.yaml

  • Issue: The prompt mentions "verified CVEs with the same CWE" but doesn't explain how that information will be provided. Consider explicitly stating how those examples will be included (e.g., in the same input, as separate files, etc.).

    • Suggestion: Add a sentence clarifying how the examples of verified CVEs are provided to the analyst. For example: "Examples of verified CVEs will be provided in a separate section after the CVE report snippet and source code."
  • Issue: "justifications", type: [string]: (The reasoning that led to the investigation_result decision)} is not valid JSON. The type definition is outside the valid syntax.

    • Suggestion: Change to "justifications": type: "array[string]", description: "The reasoning that led to the investigation_result decision"}
  • Comment: The instruction "No Clear False Positive Evidence Implies True Positive" is now "No Clear Non_Issue Evidence Implies Issue". This is logically sound with the change.

Overall, the changes are reasonable and improve the clarity of the prompt by changing the terminology. The suggestions are meant to further improve clarity and correct the JSON format.

Okay, I've reviewed the changes you've provided. Here's a breakdown of the issues and suggestions for improvement:

Review Summary:

The primary focus of these changes appears to be renaming "false positive" to "non-issue". This is a good change as it's more accurate and less loaded. The changes seem mostly correct, but there are a couple of points where the renaming might not be consistent or where the logic could be slightly improved.

File-Specific Comments:

  • src/templates/prompts/filter_system_prompt.yaml

    • Issue: Consistency. The naming convention should be consistent. In the original file it was named context_false_positives which has been renamed to context_non_issues. This is a good change but its important to make sure the variable name that uses it are also consistent.
    • Suggestion: Ensure that all references to the context variable in the code that uses this prompt are also updated to reflect the non_issue naming.
  • tests/integration/test_filter_tool_integration.py

    • Issue: Consistency. The validation status should be consistent with the new nomenclature.
    • Suggestion: Ensure the validation status is updated. CVEValidationStatus.FALSE_POSITIVE.value needs to be changed to CVEValidationStatus.NON_ISSUE.value
  • TRUE POSITIVE

    • Issue: Syntax Error. "If you identify syntax issue in the reported finding - mark it as TRUE POSITIVE." should be "If you identify syntax issue in the reported finding - mark it as ISSUE."
    • Suggestion: Should be corrected to "If you identify syntax issue in the reported finding - mark it as ISSUE."

Overall, the changes are positive and improve the clarity and accuracy of the codebase. Just a few minor tweaks to ensure consistency are needed.

--- a/tests/nat_tests/test_calculate_metrics.py
+++ b/tests/nat_tests/test_calculate_metrics.py
@@ -81,9 +81,9 @@ class TestCalculateMetricsCore(unittest.IsolatedAsyncioTestCase):
         
         issue_ids = list(tracker.issues.keys())
         tracker.issues[issue_ids[0]].analysis_response.is_final = FinalStatus.TRUE.value
-        tracker.issues[issue_ids[0]].analysis_response.investigation_result = CVEValidationStatus.TRUE_POSITIVE.value
+        tracker.issues[issue_ids[0]].analysis_response.investigation_result = CVEValidationStatus.ISSUE.value
         tracker.issues[issue_ids[1]].analysis_response.is_final = FinalStatus.FALSE.value
-        tracker.issues[issue_ids[1]].analysis_response.investigation_result = CVEValidationStatus.TRUE_POSITIVE.value
+        tracker.issues[issue_ids[1]].analysis_response.investigation_result = CVEValidationStatus.ISSUE.value
         
         # testing
         with patch('sast_agent_workflow.tools.calculate_metrics.get_human_verified_results') as mock_get_ground_truth:
@@ -108,9 +108,9 @@ class TestCalculateMetricsCore(unittest.IsolatedAsyncioTestCase):
         tracker.config = self._create_mock_config(calculate_ragas_metrics=True)
         
         issue_ids = list(tracker.issues.keys())
-        tracker.issues[issue_ids[0]].analysis_response.investigation_result = CVEValidationStatus.TRUE_POSITIVE.value
+        tracker.issues[issue_ids[0]].analysis_response.investigation_result = CVEValidationStatus.ISSUE.value
         tracker.issues[issue_ids[0]].analysis_response.is_final = FinalStatus.TRUE.value
-        tracker.issues[issue_ids[1]].analysis_response.investigation_result = CVEValidationStatus.FALSE_POSITIVE.value
+        tracker.issues[issue_ids[1]].analysis_response.investigation_result = CVEValidationStatus.NON_ISSUE.value
         tracker.issues[issue_ids[1]].analysis_response.is_final = FinalStatus.TRUE.value
         
         # testing
@@ -148,9 +148,9 @@ class TestCalculateMetricsCore(unittest.IsolatedAsyncioTestCase):
         tracker.config = self._create_mock_config(calculate_ragas_metrics=True)
         
         issue_ids = list(tracker.issues.keys())
-        tracker.issues[issue_ids[0]].analysis_response.investigation_result = CVEValidationStatus.FALSE_POSITIVE.value
+        tracker.issues[issue_ids[0]].analysis_response.investigation_result = CVEValidationStatus.NON_ISSUE.value
         tracker.issues[issue_ids[0]].analysis_response.is_final = FinalStatus.TRUE.value
-        tracker.issues[issue_ids[1]].analysis_response.investigation_result = CVEValidationStatus.FALSE_POSITIVE.value
+        tracker.issues[issue_ids[1]].analysis_response.investigation_result = CVEValidationStatus.NON_ISSUE.value
         tracker.issues[issue_ids[1]].analysis_response.is_final = FinalStatus.TRUE.value
         
         # testing
@@ -191,7 +191,7 @@ class TestCalculateMetricsCore(unittest.IsolatedAsyncioTestCase):
         tracker.config = self._create_mock_config(calculate_ragas_metrics=True)
         
         for per_issue_data in tracker.issues.values():
-            per_issue_data.analysis_response.investigation_result = CVEValidationStatus.TRUE_POSITIVE.value
+            per_issue_data.analysis_response.investigation_result = CVEValidationStatus.ISSUE.value
             per_issue_data.analysis_response.is_final = FinalStatus.TRUE.value
         
         # testing
@@ -275,7 +275,7 @@ class TestCalculateMe

Review Summary:

The changes in tests/nat_tests/test_calculate_metrics.py primarily involve renaming CVEValidationStatus.TRUE_POSITIVE to CVEValidationStatus.ISSUE and CVEValidationStatus.FALSE_POSITIVE to CVEValidationStatus.NON_ISSUE. These changes seem to be updating the test cases to align with a potential renaming or refactoring of the CVEValidationStatus enum.

File: tests/nat_tests/test_calculate_metrics.py

  • Comment: The renaming from TRUE_POSITIVE to ISSUE and FALSE_POSITIVE to NON_ISSUE makes the test more general and not specific to CVEs. Confirm that these enums are intended to represent any kind of issue, not just CVEs. If it is only for CVEs, the original naming may be better.

Okay, I've reviewed the changes. The core change seems to be replacing CVEValidationStatus.TRUE_POSITIVE.value with CVEValidationStatus.ISSUE.value in several test files. Without context, it's hard to say definitively, but here's a breakdown of potential issues and suggestions:

Review Summary:

The change consistently replaces TRUE_POSITIVE with ISSUE in the context of CVEValidationStatus within test cases. This suggests a possible shift in how test data is being categorized or interpreted. It's crucial to understand why this change is being made. Is TRUE_POSITIVE no longer the correct status for these test scenarios? Is ISSUE a more general or appropriate status?

Potential Issues and Suggestions:

  1. Lack of Context: The commit message (if any) should clearly explain why this change is necessary. What does this change represent in the bigger picture? Why is ISSUE now being used instead of TRUE_POSITIVE in these tests?

  2. Impact on Tests: Changing the expected status of issues could have significant impacts on the tests themselves. Are the tests still valid with this new status? Have the tests been updated to reflect the expected behavior with the ISSUE status?

  3. Consistency: Ensure that this change is consistently applied wherever appropriate. Are there other tests or code areas that should also be updated to use ISSUE instead of TRUE_POSITIVE?

  4. Definition of ISSUE: What exactly does the ISSUE status represent in the CVEValidationStatus enum/class? Is it well-defined and understood within the project?

File-Specific Comments:

  • tests/nat_tests/test_core.py:

    • The change seems isolated, but the reasoning needs to be clarified.
    • Is the investigation result now categorized as a general ISSUE instead of specifically a TRUE_POSITIVE?
  • tests/nat_tests/test_data_fetcher.py:

    • Multiple instances of the change. This suggests a pattern.
    • It's used in the create_sample_analysis_response function. This function's purpose should be considered in light of the change.
  • tests/nat_tests/test_filter.py:

    • No changes in this file. It is good.

Recommendations:

  • Require a clear and detailed commit message explaining the rationale behind the change.
  • Ensure the tests are still valid and updated to reflect the expected behavior with the ISSUE status.
  • Verify that the ISSUE status is well-defined and consistently used throughout the project.
  • Consider adding comments to the code explaining the reason for the change, especially if it's not immediately obvious.

Without a deeper understanding of the project and the specific meaning of these status values, it's difficult to provide more specific feedback. However, these suggestions should help guide a more thorough review.

Okay, I've reviewed the changes. Here's a summary of the issues and suggestions for improvement:

Review Summary:

The main changes revolve around renaming constants and enum values to better reflect the intended meaning. Specifically, FALSE_POSITIVE is being replaced with NON_ISSUE and TRUE_POSITIVE with ISSUE. This is a good change as it improves the clarity and avoids potential misinterpretations.

File-Specific Comments:

  • File: tests/core_tests/test_filter.py
    • Issue: The constant KNOWN_ISSUES_SHORT_JUSTIFICATION is renamed to KNOWN_NON_ISSUES_SHORT_JUSTIFICATION. Suggestion: Ensure that the old constant KNOWN_ISSUES_SHORT_JUSTIFICATION is not being used anywhere else in the codebase. Consider adding a deprecation warning if it is still used but should be removed in the future.
    • Issue: CVEValidationStatus.FALSE_POSITIVE.value is changed to CVEValidationStatus.NON_ISSUE.value and CVEValidationStatus.TRUE_POSITIVE.value is changed to CVEValidationStatus.ISSUE.value. This looks good for consistency.
  • File: tests/nat_tests/test_graph_builder.py
    • Issue: Similar to the previous file, CVEValidationStatus.FALSE_POSITIVE.value is changed to CVEValidationStatus.NON_ISSUE.value and CVEValidationStatus.TRUE_POSITIVE.value is changed to CVEValidationStatus.ISSUE.value. Looks good.

In general, the changes are positive and improve the clarity of the codebase. Just be sure to check for usages of the old constants to avoid any issues.

Okay, I've reviewed the diff. Here's a summary of the issues and suggestions:

Review Summary:

The primary change appears to be renaming/reinterpreting TRUE_POSITIVE to ISSUE in the context of CVEValidationStatus. This seems counterintuitive. Additionally, the change to KNOWN_FALSE_POSITIVE_FILE_PATH to KNOWN_NON_ISSUES_FILE_PATH is good, as it more accurately reflects the file's purpose. However, the change in meaning requires careful consideration of its impact throughout the codebase.

File-Specific Comments:

  • nat_tests/core.py

    • No issues.
  • tests/nat_tests/test_pre_process.py

    • self.assertEqual(analysis_resp.investigation_result, CVEValidationStatus.ISSUE.value): Changing from TRUE_POSITIVE to ISSUE seems incorrect. If the analysis determined it's a true positive (an actual vulnerability), it should be TRUE_POSITIVE, not ISSUE. Double-check the logic here.
  • tests/nat_tests/test_utils.py

    • create_sample_per_issue_data_dict and create_sample_analysis_response: Similar to the above, changing the default is_false_positive value to CVEValidationStatus.ISSUE.value when it's intended to represent a true positive seems semantically wrong. The parameter name is_false_positive suggests it should default to representing the absence of a false positive, which would imply a true positive.

    • The redefinition of analysis_response inside TestUtils.create_sast_workflow_tracker_mock suffers from the same problem as above

  • tests/unit/test_filter_known_issues.py

    • mock_config.KNOWN_FALSE_POSITIVE_FILE_PATH = "valid_file.txt": This change is incomplete. It's setting the old path name. It should be setting the new path name (KNOWN_NON_ISSUES_FILE_PATH). Also, rename the test file to indicate that now we are filtering KNOWN_NON_ISSUES_FILE_PATH.

General Suggestions:

  1. Reconsider the TRUE_POSITIVE to ISSUE change: Carefully review the intended meaning of CVEValidationStatus. If ISSUE is intended to mean "we found something that might be an issue" (requiring further investigation), then the changes might be valid, but the code comments and test names should be updated to reflect this new meaning. If ISSUE is intended to represent a confirmed vulnerability, then it's the wrong choice.

  2. Consistency: Ensure that all instances of KNOWN_FALSE_POSITIVE_FILE_PATH are updated to KNOWN_NON_ISSUES_FILE_PATH.

  3. Testing: Add more tests to specifically verify the behavior of the code when an issue is identified as a non-issue (present in the "known non-issues" file). Also, rename the test file to indicate that now we are filtering KNOWN_NON_ISSUES_FILE_PATH.

  4. Documentation: Update any relevant documentation to reflect the change in terminology and the new meaning of CVEValidationStatus.ISSUE (if applicable).

valid_file.txt:

- Clarity: The change from `KNOWN_FALSE_POSITIVE_FILE_PATH` to `KNOWN_NON_ISSUES_FILE_PATH` improves clarity. False positive can be ambiguous.
- Consistency: Ensure this renaming is applied consistently across the entire codebase.
</details>

@nemerna
Copy link
Collaborator

nemerna commented Nov 3, 2025

this is open since 2 months, is it needed anymore @Yael-F

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants