JSON Tokenizer (jsont)

A minimal and portable JSON tokenizer written in standard C and C++ (two separate versions). Performs validating and highly efficient parsing suitable for reading JSON directly into custom data structures. There are no code dependencies — simply include jsont.{h,hh,c,cc} in your project.

Build and run unit tests:

make

Synopsis

C API:

jsont_ctx_t* S = jsont_create(0);
jsont_reset(S, uint8_t* inbuf, size_t inbuf_len);
tok = jsont_next(S)
// branch on `tok` ...
V = jsont_*_value(S[, ...]);
jsont_destroy(S);

New C++ API:

jsont::Tokenizer S(const char* inbuf, size_t length);
jsont::Token token;
while ((token = S.next())) {
  if (token == jsont::Float) {
    printf("%g\n", S.floatValue());
  } ... else if (t == jsont::Error) {
    // handle error
    break;
  }
}

jsont::Builder json;
json.startObject()
    .fieldName("foo").value(123.45)
    .fieldName("bar").startArray()
      .value(678)
      .value("nine \"ten\"")
    .endArray()
  .endObject();
std::cout << json.toString() << std::endl;
// {"foo":123.45,"bar":[678,"nine \"ten\""]}

API overview

See jsont.h and jsont.hh for a complete overview of the API, incuding more detailed documentation. Here's an overview:

C++ API `namespace jsont`

Builder build() — convenience builder factory

class Tokenizer

Reads a sequence of bytes and produces tokens and values while doing so.

Tokenizer(const char* bytes, size_t length, TextEncoding encoding) — initialize a new Tokenizer to read bytes of length in encoding
void reset(const char* bytes, size_t length, TextEncoding encoding) — Reset the tokenizer, making it possible to reuse this parser so to avoid unnecessary memory allocation and deallocation.

Reading tokens

const Token& next() throw(Error) — Read next token, possibly throwing an Error
const Token& current() const — Access current token

Reading values

bool hasValue() const — True if the current token has a value
size_t dataValue(const char const** bytes) — Returns a slice of the input which represents the current value, or nothing (returns 0) if the current token has no value (e.g. start of an object).
std::string stringValue() const — Returns a copy of the current string value.
double floatValue() const — Returns the current value as a double-precision floating-point number.
int64_t intValue() const — Returns the current value as a signed 64-bit integer.

Handling errors

ErrorCode error() const — Returns the error code of the last error
const char* errorMessage() const — Returns a human-readable message for the last error. Never returns NULL.

Acessing underlying input buffer

const char* inputBytes() const — A pointer to the input data as passed to reset or the constructor.
size_t inputSize() const — Total number of input bytes
size_t inputOffset() const — The byte offset into input where the tokenizer is currently at. In the event of an error, this will point to the source of the error.

enum Token

End — Input ended
ObjectStart — {
ObjectEnd — }
ArrayStart — [
ArrayEnd — ]
True — true
False — false
Null — null
Integer — number value without a fraction part (access as int64 through Tokenizer::intValue())
Float — number value with a fraction part (access as double through Tokenizer::floatValue())
String — string value (access value through Tokenizer::stringValue() et al)
FieldName — field name (access value through Tokenizer::stringValue() et al)
Error — an error occured (access error code through Tokenizer::error() et al)

enum TextEncoding

UTF8TextEncoding — Unicode UTF-8 text encoding

enum Tokenizer::ErrorCode

UnspecifiedError — Unspecified error
UnexpectedComma — Unexpected comma
UnexpectedTrailingComma — Unexpected trailing comma
InvalidByte — Invalid input byte
PrematureEndOfInput — Premature end of input
MalformedUnicodeEscapeSequence — Malformed Unicode escape sequence
MalformedNumberLiteral — Malformed number literal
UnterminatedString — Unterminated string
SyntaxError — Illegal JSON (syntax error)

class Builder

Aids in building JSON, providing a final sequential byte buffer.

Builder() — initialize a new builder with an empty backing buffer
Builder& startObject() — Start an object (appends a '{' character to the backing buffer)
Builder& endObject() — End an object (a '}' character)
Builder& startArray() — Start an array ('[')
Builder& endArray() — End an array (']')
const void reset() — Reset the builder to its neutral state. Note that the backing buffer is reused in this case.

Building

Builder& fieldName(const char* v, size_t length, TextEncoding encoding=UTF8TextEncoding) — Adds a field name by copying length bytes from v.
Builder& fieldName(const std::string& name, TextEncoding encoding=UTF8TextEncoding) — Adds a field name by copying name.
Builder& value(const char* v, size_t length, TextEncoding encoding=UTF8TextEncoding) — Adds a string value by copying length bytes from v which content is encoded according to encoding.
Builder& value(const char* v) — Adds a string value by copying strlen(v) bytes from c-string v. Uses the default encoding of value(const char*,size_t,TextEncoding).
Builder& value(const std::string& v) — Adds a string value by copying v. Uses the default encoding of value(const char*,size_t,TextEncoding).
Builder& value(double v) — Adds a possibly fractional number
Builder& value(int64_t v), void value(int v), void value(unsigned int v), void value(long v) — Adds an integer number
Builder& value(bool v) — Adds the "true" or "false" atom, depending on v
Builder& nullValue() — Adds the "null" atom

Managing the result

size_t size() const — Number of readable bytes at the pointer returned by bytes()
const char* bytes() const — Pointer to the backing buffer, holding the resulting JSON.
std::string toString() const — Return a std::string object holding a copy of the backing buffer, representing the JSON.
const char* seizeBytes(size_t& size_out) — "Steal" the backing buffer. After this call, the caller is responsible for calling free() on the returned pointer. Returns NULL on failure. Sets the value of size_out to the number of readable bytes at the returned pointer. The builder will be reset and ready to use (which will act on a new backing buffer).

C API

Types

jsont_ctx_t — A tokenizer context ("instance" in OOP lingo.)
jsont_tok_t — A token type (see "Token types".)
jsont_err_t — A user-configurable error type, which defaults to const char*.

Managing a tokenizer context

jsont_ctx_t* jsont_create(void* user_data) — Create a new JSON tokenizer context.
void jsont_destroy(jsont_ctx_t* ctx) — Destroy a JSON tokenizer context.
void jsont_reset(jsont_ctx_t* ctx, const uint8_t* bytes, size_t length) — Reset the tokenizer to parse the data pointed to by bytes.

Dealing with tokens

jsont_tok_t jsont_next(jsont_ctx_t* ctx) — Read and return the next token.
jsont_tok_t jsont_current(const jsont_ctx_t* ctx) — Returns the current token (last token read by jsont_next).

Accessing and comparing values

int64_t jsont_int_value(jsont_ctx_t* ctx) — Returns the current integer value.
double jsont_float_value(jsont_ctx_t* ctx) — Returns the current floating-point number value.
size_t jsont_data_value(jsont_ctx_t* ctx, const uint8_t** bytes) — Returns a slice of the input which represents the current value.
char* jsont_strcpy_value(jsont_ctx_t* ctx) — Retrieve a newly allocated c-string.
bool jsont_data_equals(jsont_ctx_t* ctx, const uint8_t* bytes, size_t length) — Returns true if the current data value is equal to bytes of length
bool jsont_str_equals(jsont_ctx_t* ctx, const char* str) — Returns true if the current data value is equal to c string str.

Note that the data is not parsed until you call one of these functions. This means that if you know that a value transferred as a string will fit in a 64-bit signed integer, it's completely valid to call jsont_int_value to parse the string as an integer.

Miscellaneous

uint8_t jsont_current_byte(jsont_ctx_t* ctx) — Get the last byte read.
size_t jsont_current_offset(jsont_ctx_t* ctx) — Get the current offset of the last byte read.
jsont_err_t jsont_error_info(jsont_ctx_t* ctx) — Get information on the last error.
void* jsont_user_data(const jsont_ctx_t* ctx) — Returns the value passed to jsont_create

Token types

JSONT_END — Input ended.
JSONT_ERR — Error. Retrieve details through jsont_error_info
JSONT_OBJECT_START — {
JSONT_OBJECT_END — }
JSONT_ARRAY_START — [
JSONT_ARRAY_END — ]
JSONT_TRUE — true
JSONT_FALSE — false
JSONT_NULL — null
JSONT_NUMBER_INT — number value without a fraction part (access through jsont_int_value or jsont_float_value)
JSONT_NUMBER_FLOAT — number value with a fraction part (access through jsont_float_value)
JSONT_STRING — string value (access through jsont_data_value or jsont_strcpy_value)
JSONT_FIELD_NAME — field name (access through jsont_data_value or jsont_strcpy_value)

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
test		test
.gitignore		.gitignore
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
example1.c		example1.c
example2.c		example2.c
jsont.c		jsont.c
jsont.cc		jsont.cc
jsont.h		jsont.h
jsont.hh		jsont.hh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

JSON Tokenizer (jsont)

Synopsis

API overview

C++ API `namespace jsont`

class Tokenizer

Reading tokens

Reading values

Handling errors

Acessing underlying input buffer

enum Token

enum TextEncoding

enum Tokenizer::ErrorCode

class Builder

Building

Managing the result

C API

Types

Managing a tokenizer context

Dealing with tokens

Accessing and comparing values

Miscellaneous

Token types

Further reading

About

Uh oh!

Releases

Packages

Languages

License

rsms/jsont

Folders and files

Latest commit

History

Repository files navigation

JSON Tokenizer (jsont)

Synopsis

API overview

C++ API namespace jsont

class Tokenizer

Reading tokens

Reading values

Handling errors

Acessing underlying input buffer

enum Token

enum TextEncoding

enum Tokenizer::ErrorCode

class Builder

Building

Managing the result

C API

Types

Managing a tokenizer context

Dealing with tokens

Accessing and comparing values

Miscellaneous

Token types

Further reading

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

C++ API `namespace jsont`

Packages