Skip to content

XML reader is not working as expected #86

@shrutimantri

Description

@shrutimantri

Expected Behavior

When XML file with items are read, the records should be read in ion format without items or item in the ion file.
Example:
The following XML file:

<?xml version='1.0' encoding='UTF-8'?>
<items>
  <item>
    <job_title>BI Data Analyst</job_title>
    <avg_salary>836644.8</avg_salary>
  </item>
  <item>
    <job_title>ML Engineer</job_title>
    <avg_salary>679247.63</avg_salary>
  </item>
  <item>
    <job_title>Data Science Manager</job_title>
    <avg_salary>391371.17</avg_salary>
  </item>
  <item>
    <job_title>Business Data Analyst</job_title>
    <avg_salary>286000.0</avg_salary>
  </item>
  <item>
    <job_title>Data Scientist</job_title>
    <avg_salary>257422.32</avg_salary>
  </item>
  <item>
    <job_title>Computer Vision Engineer</job_title>
    <avg_salary>220583.33</avg_salary>
  </item>
  <item>
    <job_title>AI Scientist</job_title>
    <avg_salary>193666.67</avg_salary>
  </item>
  <item>
    <job_title>Applied Scientist</job_title>
    <avg_salary>190614.29</avg_salary>
  </item>
  <item>
    <job_title>Machine Learning Engineer</job_title>
    <avg_salary>175270.55</avg_salary>
  </item>
  <item>
    <job_title>Research Scientist</job_title>
    <avg_salary>161292.29</avg_salary>
  </item>
  <item>
    <job_title>Data Architect</job_title>
    <avg_salary>160283.26</avg_salary>
  </item>
  <item>
    <job_title>Data Engineer</job_title>
    <avg_salary>157510.03</avg_salary>
  </item>
  <item>
    <job_title>Machine Learning Scientist</job_title>
    <avg_salary>154638.64</avg_salary>
  </item>
  <item>
    <job_title>Research Engineer</job_title>
    <avg_salary>146618.11</avg_salary>
  </item>
  <item>
    <job_title>Analytics Engineer</job_title>
    <avg_salary>142703.15</avg_salary>
  </item>
  <item>
    <job_title>Data Science Consultant</job_title>
    <avg_salary>141937.5</avg_salary>
  </item>
  <item>
    <job_title>Data Analytics Manager</job_title>
    <avg_salary>141463.33</avg_salary>
  </item>
  <item>
    <job_title>Machine Learning Infrastructure Engineer</job_title>
    <avg_salary>141076.36</avg_salary>
  </item>
  <item>
    <job_title>BI Developer</job_title>
    <avg_salary>129846.15</avg_salary>
  </item>
  <item>
    <job_title>Data Specialist</job_title>
    <avg_salary>122083.33</avg_salary>
  </item>
  <item>
    <job_title>Data Manager</job_title>
    <avg_salary>120203.05</avg_salary>
  </item>
  <item>
    <job_title>Data Analyst</job_title>
    <avg_salary>116348.29</avg_salary>
  </item>
</items>

should be read by XML reader as:
Screenshot 2024-02-05 at 1 41 38 PM

Actual Behaviour

The following XML file:

<?xml version='1.0' encoding='UTF-8'?>
<items>
  <item>
    <job_title>BI Data Analyst</job_title>
    <avg_salary>836644.8</avg_salary>
  </item>
  <item>
    <job_title>ML Engineer</job_title>
    <avg_salary>679247.63</avg_salary>
  </item>
  <item>
    <job_title>Data Science Manager</job_title>
    <avg_salary>391371.17</avg_salary>
  </item>
  <item>
    <job_title>Business Data Analyst</job_title>
    <avg_salary>286000.0</avg_salary>
  </item>
  <item>
    <job_title>Data Scientist</job_title>
    <avg_salary>257422.32</avg_salary>
  </item>
  <item>
    <job_title>Computer Vision Engineer</job_title>
    <avg_salary>220583.33</avg_salary>
  </item>
  <item>
    <job_title>AI Scientist</job_title>
    <avg_salary>193666.67</avg_salary>
  </item>
  <item>
    <job_title>Applied Scientist</job_title>
    <avg_salary>190614.29</avg_salary>
  </item>
  <item>
    <job_title>Machine Learning Engineer</job_title>
    <avg_salary>175270.55</avg_salary>
  </item>
  <item>
    <job_title>Research Scientist</job_title>
    <avg_salary>161292.29</avg_salary>
  </item>
  <item>
    <job_title>Data Architect</job_title>
    <avg_salary>160283.26</avg_salary>
  </item>
  <item>
    <job_title>Data Engineer</job_title>
    <avg_salary>157510.03</avg_salary>
  </item>
  <item>
    <job_title>Machine Learning Scientist</job_title>
    <avg_salary>154638.64</avg_salary>
  </item>
  <item>
    <job_title>Research Engineer</job_title>
    <avg_salary>146618.11</avg_salary>
  </item>
  <item>
    <job_title>Analytics Engineer</job_title>
    <avg_salary>142703.15</avg_salary>
  </item>
  <item>
    <job_title>Data Science Consultant</job_title>
    <avg_salary>141937.5</avg_salary>
  </item>
  <item>
    <job_title>Data Analytics Manager</job_title>
    <avg_salary>141463.33</avg_salary>
  </item>
  <item>
    <job_title>Machine Learning Infrastructure Engineer</job_title>
    <avg_salary>141076.36</avg_salary>
  </item>
  <item>
    <job_title>BI Developer</job_title>
    <avg_salary>129846.15</avg_salary>
  </item>
  <item>
    <job_title>Data Specialist</job_title>
    <avg_salary>122083.33</avg_salary>
  </item>
  <item>
    <job_title>Data Manager</job_title>
    <avg_salary>120203.05</avg_salary>
  </item>
  <item>
    <job_title>Data Analyst</job_title>
    <avg_salary>116348.29</avg_salary>
  </item>
</items>

be read by XML reader as:

{"item":[{"avg_salary":836644.8,"job_title":"BI Data Analyst"},{"avg_salary":679247.63,"job_title":"ML Engineer"},{"avg_salary":391371.17,"job_title":"Data Science Manager"},{"avg_salary":286000,"job_title":"Business Data Analyst"},{"avg_salary":257422.32,"job_title":"Data Scientist"},{"avg_salary":220583.33,"job_title":"Computer Vision Engineer"},{"avg_salary":193666.67,"job_title":"AI Scientist"},{"avg_salary":190614.29,"job_title":"Applied Scientist"},{"avg_salary":175270.55,"job_title":"Machine Learning Engineer"},{"avg_salary":161292.29,"job_title":"Research Scientist"},{"avg_salary":160283.26,"job_title":"Data Architect"},{"avg_salary":157510.03,"job_title":"Data Engineer"},{"avg_salary":154638.64,"job_title":"Machine Learning Scientist"},{"avg_salary":146618.11,"job_title":"Research Engineer"},{"avg_salary":142703.15,"job_title":"Analytics Engineer"},{"avg_salary":141937.5,"job_title":"Data Science Consultant"},{"avg_salary":141463.33,"job_title":"Data Analytics Manager"},{"avg_salary":141076.36,"job_title":"Machine Learning Infrastructure Engineer"},{"avg_salary":129846.15,"job_title":"BI Developer"},{"avg_salary":122083.33,"job_title":"Data Specialist"},{"avg_salary":120203.05,"job_title":"Data Manager"},{"avg_salary":116348.29,"job_title":"Data Analyst"}]}
Screenshot 2024-02-05 at 1 43 14 PM

Steps To Reproduce

  1. Run the following flow:
id: xml-writer
namespace: company.team
description:  Analyse  data  salaries.
tasks:
  - id:  download_csv
    type:  io.kestra.plugin.fs.http.Download
    description:  Data  Job  salaries  from  2020  to  2023  (source  ai-jobs.net)
    uri:  https://gist.githubusercontent.com/Ben8t/f182c57f4f71f350a54c65501d30687e/raw/940654a8ef6010560a44ad4ff1d7b24c708ebad4/salary-data.csv

  - id:  average_salary_by_position
    type:  io.kestra.plugin.jdbc.duckdb.Query
    inputFiles:
      data.csv:  "{{ outputs.download_csv.uri }}"
    sql:  |
      SELECT
        job_title,
        ROUND(AVG(salary),2)  AS  avg_salary
      FROM  read_csv_auto('{{workingDir}}/data.csv',  header=True)
      GROUP  BY  job_title
      HAVING  COUNT(job_title)  >  10
      ORDER  BY  avg_salary  DESC;
    store:  true
  - id:  export_result
    type: "io.kestra.plugin.serdes.xml.XmlWriter"
    from:  "{{ outputs.average_salary_by_position.uri }}"
  - id: xml_reader
    type: io.kestra.plugin.serdes.xml.XmlReader
    from: "{{ outputs.export_result.uri }}"
  1. Check the output of xml_reader task.

Environment Information

  • Kestra Version: 0.13.8
  • Plugin version: 0.13.8
  • Operating System (OS / Docker / Kubernetes): Docker
  • Java Version (If not docker):

Example flow

id: xml-writer
namespace: company.team
description:  Analyse  data  salaries.
tasks:
  - id:  download_csv
    type:  io.kestra.plugin.fs.http.Download
    description:  Data  Job  salaries  from  2020  to  2023  (source  ai-jobs.net)
    uri:  https://gist.githubusercontent.com/Ben8t/f182c57f4f71f350a54c65501d30687e/raw/940654a8ef6010560a44ad4ff1d7b24c708ebad4/salary-data.csv

  - id:  average_salary_by_position
    type:  io.kestra.plugin.jdbc.duckdb.Query
    inputFiles:
      data.csv:  "{{ outputs.download_csv.uri }}"
    sql:  |
      SELECT
        job_title,
        ROUND(AVG(salary),2)  AS  avg_salary
      FROM  read_csv_auto('{{workingDir}}/data.csv',  header=True)
      GROUP  BY  job_title
      HAVING  COUNT(job_title)  >  10
      ORDER  BY  avg_salary  DESC;
    store:  true
  - id:  export_result
    type: "io.kestra.plugin.serdes.xml.XmlWriter"
    from:  "{{ outputs.average_salary_by_position.uri }}"
  - id: xml_reader
    type: io.kestra.plugin.serdes.xml.XmlReader
    from: "{{ outputs.export_result.uri }}"

Metadata

Metadata

Assignees

No one assigned

    Labels

    area/pluginPlugin-related issue or feature requestbugSomething isn't workinggood first issueGreat issue for new contributors

    Type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions