A minimal and portable JSON tokenizer written in standard C and C++ (two separate versions). Performs validating and highly efficient parsing suitable for reading JSON directly into custom data structures. There are no code dependencies — simply include jsont.{h,hh,c,cc} in your project.
Build and run unit tests:
make
C API:
jsont_ctx_t* S = jsont_create(0);
jsont_reset(S, uint8_t* inbuf, size_t inbuf_len);
tok = jsont_next(S)
// branch on `tok` ...
V = jsont_*_value(S[, ...]);
jsont_destroy(S);New C++ API:
jsont::Tokenizer S(const char* inbuf, size_t length);
jsont::Token token;
while ((token = S.next())) {
if (token == jsont::Float) {
printf("%g\n", S.floatValue());
} ... else if (t == jsont::Error) {
// handle error
break;
}
}jsont::Builder json;
json.startObject()
.fieldName("foo").value(123.45)
.fieldName("bar").startArray()
.value(678)
.value("nine \"ten\"")
.endArray()
.endObject();
std::cout << json.toString() << std::endl;
// {"foo":123.45,"bar":[678,"nine \"ten\""]}See jsont.h and jsont.hh for a complete overview of the API, incuding more detailed documentation. Here's an overview:
Builder build()— convenience builder factory
Reads a sequence of bytes and produces tokens and values while doing so.
Tokenizer(const char* bytes, size_t length, TextEncoding encoding)— initialize a new Tokenizer to readbytesoflengthinencodingvoid reset(const char* bytes, size_t length, TextEncoding encoding)— Reset the tokenizer, making it possible to reuse this parser so to avoid unnecessary memory allocation and deallocation.
const Token& next() throw(Error)— Read next token, possibly throwing anErrorconst Token& current() const— Access current token
bool hasValue() const— True if the current token has a valuesize_t dataValue(const char const** bytes)— Returns a slice of the input which represents the current value, or nothing (returns 0) if the current token has no value (e.g. start of an object).std::string stringValue() const— Returns a copy of the current string value.double floatValue() const— Returns the current value as a double-precision floating-point number.int64_t intValue() const— Returns the current value as a signed 64-bit integer.
ErrorCode error() const— Returns the error code of the last errorconst char* errorMessage() const— Returns a human-readable message for the last error. Never returns NULL.
const char* inputBytes() const— A pointer to the input data as passed toresetor the constructor.size_t inputSize() const— Total number of input bytessize_t inputOffset() const— The byte offset into input where the tokenizer is currently at. In the event of an error, this will point to the source of the error.
End— Input endedObjectStart— {ObjectEnd— }ArrayStart— [ArrayEnd— ]True— trueFalse— falseNull— nullInteger— number value without a fraction part (access as int64 throughTokenizer::intValue())Float— number value with a fraction part (access as double throughTokenizer::floatValue())String— string value (access value throughTokenizer::stringValue()et al)FieldName— field name (access value throughTokenizer::stringValue()et al)Error— an error occured (access error code throughTokenizer::error()et al)
UTF8TextEncoding— Unicode UTF-8 text encoding
UnspecifiedError— Unspecified errorUnexpectedComma— Unexpected commaUnexpectedTrailingComma— Unexpected trailing commaInvalidByte— Invalid input bytePrematureEndOfInput— Premature end of inputMalformedUnicodeEscapeSequence— Malformed Unicode escape sequenceMalformedNumberLiteral— Malformed number literalUnterminatedString— Unterminated stringSyntaxError— Illegal JSON (syntax error)
Aids in building JSON, providing a final sequential byte buffer.
Builder()— initialize a new builder with an empty backing bufferBuilder& startObject()— Start an object (appends a'{'character to the backing buffer)Builder& endObject()— End an object (a'}'character)Builder& startArray()— Start an array ('[')Builder& endArray()— End an array (']')const void reset()— Reset the builder to its neutral state. Note that the backing buffer is reused in this case.
Builder& fieldName(const char* v, size_t length, TextEncoding encoding=UTF8TextEncoding)— Adds a field name by copyinglengthbytes fromv.Builder& fieldName(const std::string& name, TextEncoding encoding=UTF8TextEncoding)— Adds a field name by copyingname.Builder& value(const char* v, size_t length, TextEncoding encoding=UTF8TextEncoding)— Adds a string value by copyinglengthbytes fromvwhich content is encoded according toencoding.Builder& value(const char* v)— Adds a string value by copyingstrlen(v)bytes from c-stringv. Uses the default encoding ofvalue(const char*,size_t,TextEncoding).Builder& value(const std::string& v)— Adds a string value by copyingv. Uses the default encoding ofvalue(const char*,size_t,TextEncoding).Builder& value(double v)— Adds a possibly fractional numberBuilder& value(int64_t v),void value(int v),void value(unsigned int v),void value(long v)— Adds an integer numberBuilder& value(bool v)— Adds the "true" or "false" atom, depending onvBuilder& nullValue()— Adds the "null" atom
size_t size() const— Number of readable bytes at the pointer returned bybytes()const char* bytes() const— Pointer to the backing buffer, holding the resulting JSON.std::string toString() const— Return astd::stringobject holding a copy of the backing buffer, representing the JSON.const char* seizeBytes(size_t& size_out)— "Steal" the backing buffer. After this call, the caller is responsible for callingfree()on the returned pointer. Returns NULL on failure. Sets the value ofsize_outto the number of readable bytes at the returned pointer. The builder will be reset and ready to use (which will act on a new backing buffer).
jsont_ctx_t— A tokenizer context ("instance" in OOP lingo.)jsont_tok_t— A token type (see "Token types".)jsont_err_t— A user-configurable error type, which defaults toconst char*.
jsont_ctx_t* jsont_create(void* user_data)— Create a new JSON tokenizer context.void jsont_destroy(jsont_ctx_t* ctx)— Destroy a JSON tokenizer context.void jsont_reset(jsont_ctx_t* ctx, const uint8_t* bytes, size_t length)— Reset the tokenizer to parse the data pointed to bybytes.
jsont_tok_t jsont_next(jsont_ctx_t* ctx)— Read and return the next token.jsont_tok_t jsont_current(const jsont_ctx_t* ctx)— Returns the current token (last token read byjsont_next).
int64_t jsont_int_value(jsont_ctx_t* ctx)— Returns the current integer value.double jsont_float_value(jsont_ctx_t* ctx)— Returns the current floating-point number value.size_t jsont_data_value(jsont_ctx_t* ctx, const uint8_t** bytes)— Returns a slice of the input which represents the current value.char* jsont_strcpy_value(jsont_ctx_t* ctx)— Retrieve a newly allocated c-string.bool jsont_data_equals(jsont_ctx_t* ctx, const uint8_t* bytes, size_t length)— Returns true if the current data value is equal tobytesoflengthbool jsont_str_equals(jsont_ctx_t* ctx, const char* str)— Returns true if the current data value is equal to c stringstr.
Note that the data is not parsed until you call one of these functions. This means that if you know that a value transferred as a string will fit in a 64-bit signed integer, it's completely valid to call jsont_int_value to parse the string as an integer.
uint8_t jsont_current_byte(jsont_ctx_t* ctx)— Get the last byte read.size_t jsont_current_offset(jsont_ctx_t* ctx)— Get the current offset of the last byte read.jsont_err_t jsont_error_info(jsont_ctx_t* ctx)— Get information on the last error.void* jsont_user_data(const jsont_ctx_t* ctx)— Returns the value passed tojsont_create
JSONT_END— Input ended.JSONT_ERR— Error. Retrieve details throughjsont_error_infoJSONT_OBJECT_START— {JSONT_OBJECT_END— }JSONT_ARRAY_START— [JSONT_ARRAY_END— ]JSONT_TRUE— trueJSONT_FALSE— falseJSONT_NULL— nullJSONT_NUMBER_INT— number value without a fraction part (access throughjsont_int_valueorjsont_float_value)JSONT_NUMBER_FLOAT— number value with a fraction part (access throughjsont_float_value)JSONT_STRING— string value (access throughjsont_data_valueorjsont_strcpy_value)JSONT_FIELD_NAME— field name (access throughjsont_data_valueorjsont_strcpy_value)
- See
example*.cfor working sample programs. - See
LICENSEfor the MIT-style license under which this project is licensed.