‹— back

a general purpose data format

Last year, mid-frustrations with game and asset configuration in prototypes, KVL came to life. A data storage format like JSON but without so many syntax quirks, like YAML but without the baggage of hidden complexity. Like XML but without ... like, all of the horrid XML. After living with it for a while, it's probably one of the code things I'm happiest with creating, ever. Using any other JSON/YAML/TOML-like always comes with some friction and having a format that works with you is so lovely.

It won't be the perfect format for everything, for everyone, but you might look at some of the ideas and think of ways they could be transformed to work for you.

Example

We'll break down a simple example piece by piece. Here's a complete file.

Keys, values, objects

Let's start at not-the-beginning, with the player object.

Every declaration in the format is a key value pair, with an alphanumeric-and-underscore key, followed by a space, followed by some thing (our value). Keys can begin with any character, so _this is valid and 0x213 is valid. A space, or multiple spaces (not line breaks) separate keys and values. There's no visual delimiter like JSON's colon.

All top-level declarations have to be objects, some curly braces containing more declarations. Objects can be empty as shown, but you might want a comment in there to say why it's empty. All comments are C-style /* like this */ and can be nested.

Keys have to be unique amongst their siblings, so we can't have two of player alongside each other - but a player containing a player is fine.

Objects can be nested to any level.

Numbers

Numbers are always considered doubles, every time. No integers nor booleans. Valid declarations include:

Keeping all numbers as the same type and precision makes things less confusing overall. Hex notation is useful to have.

Possible addition

Octal notation, but I haven't needed it.

Strings

We have three different forms of string notation.

Quoted strings are always denoted with double quotes, never apostrophes. Quotes within quoted strings can be escaped "like \"so\"".

We can have unquoted strings too. These aren't the same as keys: anything is allowed until the next bit of white space, so <this/silly/thing> is a valid unquoted string. The only rule is they can't start with a digit 0-9, because anything starting as a digit will be interpreted as a number.

The third thing, for some reason called a data field, is the thing that make this format work for me. Clear multi-line strings with intuitive syntax to anybody familiar with regex (^{ to start and }$ to end). The reader will strip the first and last space character, so ^{this}$ and ^{ this }$ are the same. But it won't strip line breaks.

Possible changes

  • Apostrophe-quoted strings. Maybe useful, I kind of like the option in other formats, but usually end up confusing myself.
  • Maybe data fields shouldn't strip any white space. They're kinda meant to be completely raw.
  • Data-fields should have additional notation in case a string wants to include ^{ or }$. This could be by symmetrically extending the notation (^^^^{ to match }$$$$) or by adding custom notation (^mydelim{ to match }mydelim$).

Arrays

Arrays use square bracket notation. They're a bit special. We have arrays of objects and arrays of strings. An array must consistently contain the same type. Something like a number, conforming with unquoted string syntax, is interpreted as a string.

This is all a bit questionable, but the language is shaped by usage and this covers most of the desired use. Objects half the time, and something else the rest of the time, which can be interpreted through strings.

Possible changes

  • Mixed arrays, if there's ever a good use-case
  • Number arrays. These might be denoted with different syntax to differ them from unquoted strings e.g. #[ 1 2 3 ]
  • Length declarations could extend from number array syntax, where 3#[ 12, 24, 36 ] could be a vec3.

That covers all of the basic types, which leaves us with this thing from the original example.

This is an optional section called the header. Unlike standard declarations where the top level must always be an object, this section permits all declaration types. The use case has always been document metadata (the --- syntax might be familiar if you've used markdown generators for things like GitHub pages). This section originally only allowed string values, but needs changed and now anything goes.

Declarations in the header aren't siblings to root document objects, so these keys won't conflict with any actual data.

Possible extension

This syntax could be changed to split the document into sections.

I think this syntax would be useful to someone, but not me right now. It might be used for organising things like different levels within the same file where nesting objects might become a bit messy. It's an idea. The header is enough right now.

That's it for the fundamental declarations! But there's one more important thing.

Error reporting

Everything about this format is entirely through use-case and to reduce friction compared to other similar formats. It started as data storage for game prototypes, which I compile on the command line with clang and clang style error reporting is pretty enough. And then the language evolved to become part of the build system. So clang-style error reporting doubly made sense.

Let's try to compile something invalid, an array missing its ending.

We get this back.

This is the worst case: there's no contextual information and it's doubled up on the error message. But it visually fits in with regular clang++-style messages and provides enough raw information (a line number and character position) to figure things out.

Future

And that's it! For all of the possible considerations for what could happen, it will probably stay static for now :) Things like hard-typing vec2, vec3 ... etc types might become more important over time, but those cases are minimal enough to not be concerning right now.

It's a nice format. I like it. Do you use a domain-specific language in your projects? If you're considering one I hope this helps; if you have an idea that could be useful, or see something disastrous, please let me know! I'd love to hear from people doing similar things ^-^