fix(metadata,cli,frontend): ingest jsPsych CSVs with unescaped quotes#132
fix(metadata,cli,frontend): ingest jsPsych CSVs with unescaped quotes#132Mandyx22 wants to merge 2 commits into
Conversation
jsPsych can export the `stimulus` column as unquoted HTML containing literal `"` (e.g. `<div class = "EncodingBox">`), which violates strict RFC-4180 quoting. Previously csv-parse threw "Invalid Opening Quote" and the entire file was dropped, making such datasets unreadable end to end (observed on a 1258-file OSF working-memory dataset: 0 files read). - parseCSV sets `relax_quotes: true` so the row parses instead of being rejected. - New `parseCSVForWrite` reports whether the content was already strictly valid CSV. The CLI and frontend use it so a clean file keeps its exact bytes (verbatim), while a file that only parsed thanks to relaxation is re-serialised to well-formed CSV — otherwise the malformed bytes land in the Psych-DS data/ payload and the validator rejects them with CSV_FORMATTING_ERROR. Net: these datasets now ingest and pass Psych-DS validation through both the library/CLI and the browser uploader. Adds regression tests for the parse and the re-serialise-on-write behavior. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
🦋 Changeset detectedLatest commit: 0dc55d0 The changes in this PR will be included in the next version bump. This PR includes changesets to release 3 packages
Not sure what this means? Click here to learn what changesets are. Click here if you're a maintainer who wants to add another changeset to this PR |
jsPsych stimulus HTML often contains both a literal `"` and a `,`. relax_quotes keeps the quote literal but the comma still splits the field, so csv-parse throws "Invalid Record Length" and the file is still dropped. Documents the gap as a test.failing so CI stays green and the spec flips to a hard failure once comma-bearing stimuli ingest correctly. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
|
The fix handles quote-only fields but not the common quote+comma case.
The stimulus's inner comma makes the row 5 fields against 4 headers, so csv-parse throws I pushed a target spec documenting it (commit 0dc55d0, Note "make it parse" isn't free: the only csv-parse knob that lets these through is |
Problem
Some jsPsych experiments export the
stimuluscolumn as unquoted HTML containing literal"(e.g.<div class = "EncodingBox">), which violates strict RFC-4180 quoting. Surfaced on a real 1258-file OSF working-memory dataset (osf.io/phxq4) where the tool read 0 files —csv-parsethrewInvalid Opening Quoteand dropped every file. Both the CLI and the browser uploader were affected.There were two layers to fix:
data/payload and the validator (which also strict-parses CSV) rejected it withCSV_FORMATTING_ERROR.Changes
parseCSVnow setsrelax_quotes: true, so a quote inside an unquoted field no longer throws and drops the file; the HTML is kept intact.parseCSVForWritehelper returns the parsed rows plus averbatimSafeflag (strict-parse probe). The CLI and frontend use it so:Result (verified end to end)
CSV_FORMATTING_ERRORDataUpload)errorClean CSVs are unaffected (still verbatim) — only files that were previously 100% unreadable change.
Tests
"now parses instead of dropping the file.DataUploadmock for the new helper.🤖 Generated with Claude Code