Skip to content

Dummy log type parser#669

Closed
prasoonbirla-google wants to merge 3 commits intomainfrom
pr-demo
Closed

Dummy log type parser#669
prasoonbirla-google wants to merge 3 commits intomainfrom
pr-demo

Conversation

@prasoonbirla-google
Copy link
Copy Markdown
Contributor

Title (Please follow the convention below)

Please use a clear and concise title that summarizes your changes.
If this PR is related to an internal Buganizer ticket, please include its ID at the beginning.

Convention: [Optional Buganizer ID: 123456789] Short, descriptive title of changes

Examples:

  • Fix: Resolve issue with API endpoint returning 500 error
  • [Buganizer ID: 987654321] Feature: Add support for custom data types
  • Docs: Update README with installation instructions

Description

Please provide a detailed description of your changes. This helps reviewers understand your work and its context.

What problem does this PR solve?
(e.g., "Fixes a bug where X was happening," "Implements feature Y to allow Z," "Improves performance of function A.")

How does this PR solve the problem?
(e.g., "Modified algorithm in src/foo.js," "Added new component Bar.vue," "Updated dependency baz to version 1.2.3.")

Any other relevant information (e.g., design choices, tradeoffs, known issues):
(e.g., "Chose approach A over B due to performance considerations," "This change might affect X in certain edge cases," "Requires manual migration steps for existing users.")


Checklist:

Please ensure you have completed the following items before submitting your PR.
This helps us review your contribution faster and more efficiently.

General Checks:

  • I have read and followed the project's contributing.md guide.
  • My code follows the project's coding style guidelines.
  • I have performed a self-review of my own code.
  • My changes do not introduce any new warnings.
  • My changes pass all existing tests.
  • I have added new tests where appropriate to cover my changes. (If applicable)
  • I have updated the documentation where necessary (e.g., README, API docs). (If applicable)

Open-Source Specific Checks:

  • My changes do not introduce any Personally Identifiable Information (PII) or sensitive customer data.
  • My changes do not expose any internal-only code examples, configurations, or URLs.
  • All code examples, comments, and messages are generic and suitable for a public repository.
  • I understand that any internal context or sensitive details related to this work are handled separately in internal systems (Buganizer for Google team members).

For Google Team Members and Reviewers Only:

  • I have included the Buganizer ID in the PR title or description (e.g., "Internal Buganizer ID: 123456789" or "Related Buganizer: go/buganizer/123456789").
  • I have ensured that all internal discussions and PII related to this work remain in Buganizer.
  • I have tagged the PR with one or more labels that reflect the pull request purpose.

Screenshots (If Applicable)

If your changes involve UI or visual elements, please include screenshots or GIFs here.
Ensure any sensitive data is redacted or generalized.


Further Comments / Questions

Any additional comments, questions, or areas where you'd like specific feedback.

@prasoonbirla-google prasoonbirla-google requested a review from a team as a code owner April 9, 2026 10:35
@gemini-code-assist
Copy link
Copy Markdown
Contributor

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces a new log parser configuration designed to handle a specific dummy log format. By implementing this parser, the system gains the ability to ingest and process these logs into a standardized format, ensuring consistency and proper event mapping for downstream analysis.

Highlights

  • New Log Parser Implementation: Added a new dummy log type parser configuration to support ingestion of generic event logs.
  • Metadata and Testing: Included necessary metadata configuration and comprehensive test data to validate the parser's functionality.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a new log type parser for DUMMY_LOGTYPE, including its Logstash filter configuration (dummy_logtype.conf), metadata, and associated test data. The review identifies several improvements for the dummy_logtype.conf file, such as simplifying the msg field handling to correctly map to the UDM structure, removing unused field initializations, optimizing the grok and kv filter configurations, and using rename for event_data to ensure temporary fields are cleaned up. A minor formatting issue with trailing whitespace was also noted. Furthermore, a typo was found in metadata.json, and the usecase1 test files were flagged as duplicates of existing test files, suggesting they should either be made unique or removed to streamline the test suite.

Comment on lines +40 to +60
if [msg] != "" {
mutate {
replace => {
"msg_label.value.string_value" => "%{msg}"
}
on_error => "msg_empty"
}
if ![msg_empty] {
mutate {
replace => {
"msg_label.key" => "msg"
}
}
mutate {
merge => {
"event.idm.read_only_udm.additional.fields" => "msg_label"
}
on_error => "msg_label_empty"
}
}
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

This logic for handling the msg field is overly complex and does not produce the output structure defined in testdata/expected_events/test_events.json. The expected JSON has additional.msg, not an array in additional.fields.

You can greatly simplify this and correct the output by using a single mutate filter with rename to move the msg field to its correct location in the UDM structure. This is more efficient and easier to read.

  if [msg] != "" {
    mutate {
      rename => { "msg" => "[event][idm][read_only_udm][additional][msg]" }
    }
  }

Comment on lines +3 to +10
replace => {
"event_data" => ""
"productlogid" => ""
"kv_msg" => ""
"msg" => ""
"deviceCustomDate1" => ""
"rt" => ""
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The fields productlogid, deviceCustomDate1, and rt are initialized with empty strings but are not used anywhere else in the filter. To improve clarity and avoid confusion, it's best to remove these unused field initializations.

    replace => {
      "event_data" => "",
      "kv_msg" => "",
      "msg" => ""
    }

"%{GREEDYDATA:event_data} \\| %{GREEDYDATA:kv_msg}"
]
}
overwrite => ["event_data" ,"msg" ,"kv_msg"]
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The msg field is not generated by this grok filter, so including it in the overwrite list is redundant. The msg field is populated later by the kv filter. Removing it from this list will make the configuration clearer.

    overwrite => ["event_data" ,"kv_msg"]

Comment on lines +27 to +32
kv {
source => "kv_msg"
field_split => "#"
value_split => "="
on_error => "kv_failure"
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The kv_msg field is a temporary field used as the source for the kv filter. To keep the final event clean and avoid including intermediate fields, you should remove kv_msg after it has been processed. You can achieve this by adding the remove_field option to the kv filter.

  kv {
    source => "kv_msg"
    field_split => "#"
    value_split => "="
    on_error => "kv_failure"
    remove_field => ["kv_msg"]
  }

Comment on lines +62 to +69
if [event_data] != "" {
mutate {
replace => {
"event.idm.read_only_udm.metadata.description" => "%{event_data}"
}
on_error => "event_data_empty"
}
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Using replace with string interpolation (%{event_data}) copies the field's value but leaves the original event_data field at the root of the event. It's a better practice to use rename, which moves the field to its destination. This is more efficient and ensures temporary fields are cleaned up.

The on_error tag is also redundant here because of the preceding if check.

  if [event_data] != "" {
    mutate {
      rename => { "event_data" => "[event][idm][read_only_udm][metadata][description]" }
    }
  }

"@output" => "event"
}
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This line contains only whitespace. Please remove it to maintain consistent formatting and improve readability.

{
"product": "DUMMY Product",
"vendor": "Test Vendor",
"description": "Some sort of product from this vendorr.",
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

There appears to be a typo in the description. "vendorr" should likely be "vendor".

Suggested change
"description": "Some sort of product from this vendorr.",
"description": "Some sort of product from this vendor.",

Comment on lines +1 to +38
{
"events": [
{
"event" : {
"timestamp": "2021-03-23T08:20:27.863384Z",
"idm": {
"read_only_udm": {
"metadata": {
"event_timestamp": "2021-03-23T08:20:27.863384Z",
"event_type": "GENERIC_EVENT",
"description": "No New Ingestion Activity"
},
"additional": {
"msg": "No reports have been ingested since MAR 23 2021 00:18:31."
}
}
}
}
},
{
"event" : {
"timestamp": "2021-03-23T08:20:27.863384Z",
"idm": {
"read_only_udm": {
"metadata": {
"event_timestamp": "2021-03-23T08:20:27.863384Z",
"event_type": "GENERIC_EVENT",
"description": "No New Ingestion Activity"
},
"additional": {
"msg": "No reports have been ingested since MAR 23 2021 00:18:32."
}
}
}
}
}
]
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This file appears to be an exact duplicate of test_events.json. Similarly, usecase1_log.json is a duplicate of test_log.json. If usecase1 is intended to cover a different scenario, please update the files with unique test data. If not, consider removing the redundant usecase1 files to simplify the test suite and reduce maintenance overhead.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant