Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
41 commits
Select commit Hold shift + click to select a range
2f2b014
stub of command-line js-only version of index.html
scottleibrand Feb 10, 2017
6d5f4d6
Update README.md
danamlewis Feb 10, 2017
e277869
basic skeleton: reads in a json file and prints it stringified
scottleibrand Feb 10, 2017
c5d5c28
Merge pull request #1 from scottleibrand/gh-pages
danamlewis Feb 10, 2017
b8c1e35
call doCSV, minus the html bits
scottleibrand Feb 10, 2017
365f289
require stuff
scottleibrand Feb 10, 2017
896e1b4
convert functions to node-compatible form
scottleibrand Feb 12, 2017
1c0afa6
make everything work
scottleibrand Feb 12, 2017
4f96688
Merge pull request #2 from scottleibrand/gh-pages
danamlewis Feb 12, 2017
799225e
print inputData.length first
scottleibrand Feb 12, 2017
0171d7e
try splitting data into 100k-record chunks
scottleibrand Feb 12, 2017
982a50f
gonna do this a different way
scottleibrand Feb 12, 2017
6b49674
Merge branch 'gh-pages' of https://github.com/danamlewis/json into gh…
danamlewis Feb 12, 2017
98c7429
This creates a script to split a large json file into manageable chunks
danamlewis Feb 12, 2017
c9ba62f
renamed
danamlewis Feb 12, 2017
58dbcb6
Add running directions
danamlewis Feb 12, 2017
0a55caf
Creating a package.json
danamlewis Feb 12, 2017
a59a033
disable debug line
scottleibrand Feb 12, 2017
acbbe98
clean up filenames; print progress lines
scottleibrand Feb 12, 2017
4217360
progress message
scottleibrand Feb 12, 2017
4a0ecb7
Merge pull request #4 from scottleibrand/gh-pages
danamlewis Feb 12, 2017
b1e996b
Removing web-related functions that are not needed for command line
danamlewis Feb 13, 2017
f84cf0c
Typo fix for commenting out
danamlewis Feb 13, 2017
26d06fc
Another missed /
danamlewis Feb 13, 2017
fcb982e
Uncomment showCSV
danamlewis Feb 13, 2017
cb81c75
Removing web-version related code unneeded for command line usage
danamlewis Feb 13, 2017
a55c91b
Add missing }
danamlewis Feb 13, 2017
c487490
Update README.md
danamlewis Feb 13, 2017
5f2aa91
Add command line instructions to README
danamlewis Feb 13, 2017
1c78c79
Bug fixed for no input file and unspecified record length plus error …
danamlewis Feb 13, 2017
d633d60
Merge branch 'gh-pages' of https://github.com/danamlewis/json into gh…
danamlewis Feb 13, 2017
189b916
Clean up previous error checking no longer needed
danamlewis Feb 13, 2017
dab89af
Update README with clarified command line instructions for each tool
danamlewis Feb 13, 2017
a2aea27
Adjust README for npm patch
danamlewis Feb 13, 2017
33a5d60
0.0.2
danamlewis Feb 13, 2017
a513d26
Merge branch 'gh-pages' of https://github.com/danamlewis/json into gh…
danamlewis Feb 13, 2017
ab2bb20
0.0.3
danamlewis Feb 13, 2017
32ba6d0
Re-convert README to prep for PR to konklone
danamlewis Feb 13, 2017
4596bf6
Merge branch 'gh-pages' into gh-pages
konklone Jun 11, 2017
b77f006
update dependencies in README
edengh Apr 20, 2018
e44f85f
Merge pull request #6 from edengh/gh-pages
danamlewis Apr 22, 2018
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
23 changes: 18 additions & 5 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,13 +1,26 @@
## JSON to CSV Converter
## Complex JSON to CSV Converter

A simple JSON to CSV converter that handles objects and nested documents.

Conversion happens inside the browser, in straight JavaScript. It may choke on large files.
* In the **web** version, conversion happens inside the browser, in straight JavaScript. It may choke on large files.
* With complex-csv2json.js, it can be run via the **command line**. It uses jsonsplit.sh to deal with large files.

Please file all bugs [in the issue tracker](https://github.com/konklone/json/issues).
To install and run via the command line:
* `npm install -g complex-json2csv`
* type the name of the command and provide an input file
* `complex-json2csv inputfile.json` - this will print the output to the screen.
* `complex-json2csv inputfile.json > outputfile.csv` - this will print output to a csv file
* `jsonsplit inputfile.json [records]` - this will split the file into records based on the [records] size
* If you do not specify size, it defaults to splitting by 100000

If not yet already done, also remember to install `json` via the command line:
* `npm install -g json`

Read more about the converter and why I built it: "[Making JSON as simple as a spreadsheet](http://sunlightfoundation.com/blog/2014/03/11/making-json-as-simple-as-a-spreadsheet/)".
(Web tool originally from https://github.com/konklone/json; command line tool complex-csv2json and jsonsplit by [@DanaMLewis](https://github.com/danamlewis).)

Please file all bugs [in the issue tracker](https://github.com/konklone/json/issues).

Read more about the converter and why I (@konklone) built it: "[Making JSON as simple as a spreadsheet](http://sunlightfoundation.com/blog/2014/03/11/making-json-as-simple-as-a-spreadsheet/)".

## Public domain

Expand All @@ -17,4 +30,4 @@ All **other files** in this project are [dedicated to the public domain](LICENSE

> The project is in the public domain within the United States, and copyright and related rights in the work worldwide are waived through the [CC0 1.0 Universal public domain dedication](http://creativecommons.org/publicdomain/zero/1.0/).

> All contributions to this project will be released under the CC0 dedication. By submitting a pull request, you are agreeing to comply with this waiver of copyright interest.
> All contributions to this project will be released under the CC0 dedication. By submitting a pull request, you are agreeing to comply with this waiver of copyright interest.
21 changes: 11 additions & 10 deletions assets/site.js
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ Events = {

}

function getParam(name) {
getParam = function(name) {
name = name.replace(/[\[]/, "\\[").replace(/[\]]/, "\\]");
var regex = new RegExp("[\\?&]" + name + "=([^&#]*)"),
results = regex.exec(location.search);
Expand All @@ -25,7 +25,7 @@ function getParam(name) {

// depends on jquery and jquery-csv (for now)

function parse_object(obj, path) {
parse_object = function(obj, path) {
if (path == undefined)
path = "";

Expand Down Expand Up @@ -56,8 +56,9 @@ function parse_object(obj, path) {


// otherwise, just find the first one
function arrayFrom(json) {
arrayFrom = function(json) {
var queue = [], next = json;

while (next !== undefined) {
if ($.type(next) == "array") {

Expand All @@ -82,15 +83,15 @@ function arrayFrom(json) {

// adapted from Mattias Petter Johanssen:
// https://www.quora.com/How-can-I-parse-unquoted-JSON-with-JavaScript/answer/Mattias-Petter-Johansson
function quoteKeys(input) {
quoteKeys = function(input) {
return input.replace(/(['"])?([a-zA-Z0-9_]+)(['"])?:/g, '"$2": ');
}

function removeSmartQuotes(input) {
removeSmartQuotes = function(input) {
return input.replace(/[“”]/g, "\"");
}

function removeTrailingComma(input) {
removeTrailingComma = function(input) {
if (input.slice(-1) == ",")
return input.slice(0,-1);
else
Expand All @@ -100,19 +101,19 @@ function removeTrailingComma(input) {
// Rudimentary, imperfect detection of JSON Lines (http://jsonlines.org):
//
// Is there a closing brace and an opening brace with only whitespace between?
function isJSONLines(string) {
isJSONLines = function(string) {
return !!(string.match(/\}\s+\{/))
}

// To convert JSON Lines to JSON:
// * Add a comma between spaced braces
// * Surround with array brackets
function linesToJSON(string) {
linesToJSON = function(string) {
return "[" + string.replace(/\}\s+\{/g, "}, {") + "]";
}

// todo: add graceful error handling
function jsonFrom(input) {
jsonFrom = function(input) {
var string = $.trim(input);
if (!string) return;

Expand Down Expand Up @@ -158,4 +159,4 @@ function jsonFrom(input) {
console.log("Nope: that didn't work either. No good.")

return result;
}
}
156 changes: 156 additions & 0 deletions complex-json2csv.js
Original file line number Diff line number Diff line change
@@ -0,0 +1,156 @@
#!/usr/bin/env node

var input;

function usage ( ) {
console.log('usage: ', process.argv.slice(0, 2), 'inputfile.json');
}

if (!module.parent) {

var jsdom = require("jsdom").jsdom;
global.window = jsdom().defaultView;
global.jQuery = global.$ = require("jquery");

require('./assets/jquery-2.1.1.min.js');
require('./assets/jquery.csv.js');
require('./assets/site.js');
var inputFileName = process.argv.slice(2, 3).pop();

if ([null, '--help', '-h', 'help'].indexOf(inputFileName) > 0) {
usage( );
process.exit(0)
}
if (!inputFileName) {
usage( )
process.exit(1);
}

var fs = require('fs');
var cwd = process.cwd()
try {
var inputData = JSON.parse(fs.readFileSync(inputFileName, 'utf8'));
} catch (e) {
return console.error("Could not parse input file: ", e);
}

//console.error(JSON.stringify(inputData));
//console.error("About to convert",inputData.length,"records to CSV");
doCSV(inputData);
}

function doJSON() {
// just in case
$(".drop").hide();

// get input JSON, try to parse it
var newInput = $(".json textarea").val();
if (newInput == input) return;

input = newInput;
if (!input) {
// wipe the rendered version too
$(".json code").html("");
return;
}

var json = jsonFrom(input);

// if succeeded, prettify and highlight it
// highlight shows when textarea loses focus
if (json) {
// Reset any error message from previous failed parses.
$("div.error").hide();
$("div.warning").show();

var pretty = JSON.stringify(json, undefined, 2);
$(".json code").html(pretty);
if (pretty.length < (50 * 1024))
hljs.highlightBlock($(".json code").get(0));

// convert to CSV, make available
doCSV(json);
} else {
// Show error.
$("div.warning").hide();
$("div.error").show();
$(".json code").html("");
}

// Either way, update the error-reporting link to include the latest.
setErrorReporting(null, input);

return true;
}


function showCSV(rendered) {
if (rendered) {
if ($(".csv table").html()) {
$(".csv .rendered").show();
$(".csv .editing").hide();
}
} else {
$(".csv .rendered").hide();
$(".csv .editing").show().focus();
}
}

// takes an array of flat JSON objects, converts them to arrays
// renders them into a small table as an example
function renderCSV(objects) {
var rows = $.csv.fromObjects(objects, {justArrays: true});
if (rows.length < 1) return;

// find CSV table
var table = $(".csv table")[0];
$(table).html("");

// render header row
var thead = document.createElement("thead");
var tr = document.createElement("tr");
var header = rows[0];
for (field in header) {
var th = document.createElement("th");
$(th).html(header[field])
tr.appendChild(th);
}
thead.appendChild(tr);

// render body of table
var tbody = document.createElement("tbody");
for (var i=1; i<rows.length; i++) {
tr = document.createElement("tr");
for (field in rows[i]) {
var td = document.createElement("td");
$(td)
.html(rows[i][field])
.attr("title", rows[i][field]);
tr.appendChild(td);
}
tbody.appendChild(tr);
}

table.appendChild(thead);
table.appendChild(tbody);
}

function doCSV(json) {
// 1) find the primary array to iterate over
// 2) for each item in that array, recursively flatten it into a tabular object
// 3) turn that tabular object into a CSV row using jquery-csv
var inArray = arrayFrom(json);

var outArray = [];
for (var row in inArray)
outArray[outArray.length] = parse_object(inArray[row]);

//$("span.rows.count").text("" + outArray.length);

var csv = $.csv.fromObjects(outArray);
console.log(csv);
// excerpt and render first 10 rows
//renderCSV(outArray.slice(0, excerptRows));
showCSV(true);

}
40 changes: 40 additions & 0 deletions jsonsplit.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,40 @@
#!/bin/bash

# exit immediately on any errors
set -eu

#if not specified, then set the records to 100,000
if [ $# -eq 0 ]; then
echo "Error: Please provide an input file. "
echo "Usage: $0 inputfile.json [records]"
exit
elif [ $# -eq 1 ]; then
records=100000
else
records=$2
fi

#get input file
fullinputfile=$1
inputfile=${fullinputfile%.json}
partsdir=${inputfile}_parts

#create a directory to store the split files
mkdir -p $partsdir

#splits the input file into (record size) chunks
cat $fullinputfile | jq -c -M '.[]' | split -l $records - $partsdir/${inputfile}_

#before json -g we need to get rid of existing jsons
rm $partsdir/*.json 2>&1 | grep -v "No such file" || echo -n ""

#converts chunked files into json array
cd $partsdir/
ls | while read file; do echo -n "." > /dev/stderr; done; echo > /dev/stderr
echo "Grouping split records into valid json..."
ls | while read file; do
cat $file | json -g > $file.json
echo -n "-" > /dev/stderr
done
echo > /dev/stderr
cd ..
19 changes: 19 additions & 0 deletions package.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
{
"name": "complex-json2csv",
"preferGlobal": true,
"version": "0.0.3",
"author": "Dana M. Lewis",
"description": "command line json2csv converter that supports super large files and complex, unknown json schemas",
"license": "MIT",
"engines": {
"node": ">=0.10"
},
"bin": {
"complex-json2csv": "./complex-json2csv.js",
"jsonsplit": "./jsonsplit.sh"
},
"dependencies": {
"jquery": "~3.1.1",
"jsdom": "~9.11.0"
}
}