JSYNC

JSYNC 1.0 Specification

    {
        "!": "Meta",
        "Status":   "pre-Alpha",
        "Revision": "!date 18 June 2010",
        "Authors":  [
            "Ingy döt Net"
        ]
    }

Introduction

JSYNC (pronounced jay-sink; IPA: /ˈdʒeɪsingk/), stands for JavaScript and YAML Notation Coding. It is a data serialization language based on the JSON data interchange format, and the YAML serialization language. It takes the simplicity of JSON and adds just a few YAML concepts to become a complete serialization language.

YAML is a language that was first conceived in 2001. It started with the 3 basic data models of modern programming languages: mappings (aka objects/hashes/dictionaries), sequences (aka arrays/lists), and scalars (aka single values). It added a URL based type system and a simple reference notation. With just those simple primitives, YAML is able to serialize any computer data graph.

JSON was also started in 2001, albeit completely independent of YAML. It used a subset of the JavaScript data syntax to describe the same primitives as YAML: mappings, sequences and scalars. JSON is not a complete serialization language, nor was intended to be. It is a data interchange format for easily communicating common data structures.

Both in syntax and data model, it turns out that JSON is a proper subset of YAML. In other words, any YAML loader can be used to properly load (decode) any valid JSON stream. In its syntax, YAML is vastly more rich and complex than JSON, but in terms of the data model, there are only a handful of things missing from JSON to give it the full power of YAML. In other words, to make it a complete serialization language:

  • JSON has no node type system, beyond Mapping, Array, String, Number, Boolean and Null.

  • JSON has no node reference system.

  • JSON only allows Strings as mapping keys. YAML allows any node.

  • JSON only allows the encoding of top level mappings and sequences, not scalars.

  • JSON only allows exactly 1 top level node. YAML allows 0 or more.

JSYNC is a format that is 100% JSON, but adds these missing concepts to become a complete serialization language.

YAML is widely used by dynamic languages like Ruby, Python, Perl and PHP, but it has suffered to some degree because the quality of implementations has varied so widely. This is due to the fact that the YAML specification is quite complex and thus difficult to implement.

JSON, on the other hand has spread like wildfire through the above languages and dozens more. This is generally attributed to its simplicity and ease of implementation.

JSYNC hopes to leverage the power of both of these formats to provide a very simple and very interoperable serialization language.


About this Specification

This specification uses the (programming language agnostic) terminology set forth in the YAML specification. Please refer to it as a guide. One primary term that is used is "mapping", which is the same as "object" in JSON. In this spec, "mapping" is always used to refer to a collection of key/value pairs. "Object" is used in the Object Oriented sense, meaning an in-memory instance of a class.

For every possible JSYNC serialization, there is an equivalent YAML form. For this reason, JSYNC examples will often be shown next to their YAML equivalents. Therefore, knowledge of YAML is required to understand the full meaning of the examples.

To keep this specification reasonably simple, concepts that are defined in the YAML and JSON specifications are not fully respecified here.


JSYNC Design Goals

JSYNC attempts to:

  • Be a portable, language-independent data serialization language.

  • Be a proper superset of JSON.

  • Be a minimal extension of JSON.

  • Offer all the serialization capability of YAML.

JSYNC does not attempt to:

  • Be human friendly (readable/editable by everyone).

  • Be forgiving of syntax or semantic errors.

  • Add more YAML concepts than necessary.


Preview

This section gives a series of simple examples, to demonstrate the capabilities of JSYNC.

Simple Typed Mapping

This would be loaded into a given programming language environment as an instance object of a Soldier class:

{
    "!": "Soldier",
    "name": "Benjamin",
    "rank": "Private",
    "serial number": 123456789
}

Equivalent YAML:

--- !Soldier
name: Benjamin
rank: Private
serial number: 123456789

A Reference

He and she share the same car:

{
    "His car": {
        "&": "001",
        "make": "Volvo",
        "vin": "918273645"
    },
    "Her car": "*001"
}

YAML:

His car: &001
  make: Volvo
  vin: 918273645
Her car: *001

A Recursive Data Structure

Looking into a mirror infinitely:

{
    "&": "Mirror",
    "look": "*Mirror"
}

YAML:

--- &Mirror
look: *Mirror

Multiple Documents

If you have streaming implementations on both ends of a JSYNC communication, you could send and receive/process a non-terminating JSON stream. Here is how to send a stream of multiple top level documents in JSYNC:

[
    {"%JSYNC":"1.0"}
    ,{
        "!": "event",
        "coordinates": [10, 13]
    }
    ,{
        "!": "event",
        "coordinates": [10, 15],
    }
    ,{
        "!": "event",
        "coordinates": [10, 15.5],
    }
]

Note

The first mapping is special JSYNC meta information. See Directives below.

YAML:

--- !event
coordinates: [10, 13]
...
--- !event
coordinates: [10, 15]
...
--- !event
coordinates: [10, 15.5]
...

JSYNC Structural Concepts

This section introduces the concepts that JSYNC adds to JSON: Tags, Anchors, Aliases, Complex Keys and Directives. It also discusses the top level node rules that are added from YAML.

These concepts are all fully described in the YAML Spec, so please refer to that for the complete details.

Type Tags

Tags are URLs that denote data types. They are denoted by beginning with a "!" character.

A fully qualified tag URL looks like this:

!<tag:example.com,2010:Thing>

More often, tags are abbreviated to something that looks like one of these:

!example!Thing
!!Thing
!Thing

The tags are expanded by a JSYNC processor into their fully qualified forms, by %TAG directives (described below) or by configuring the processor directly in a program.

Anchors and Aliases

JSYNC uses YAML’s Anchor/Alias system to serialize multiple references to an identical node, including circular references. The first time such a node is serialized, it is marked with a unique string, preceded by a "&" character. This string is called an Anchor, and it looks like this:

&001

Subsequent serializations of the same node are identified by the same string preceded by a "*" character. This is called an Alias:

*001

Complex Mapping Keys

Any node can be used as a mapping key by first using the node as a mapping value whose key is of the form "&" plus an identifier. Then the alias string form can be used to reference it. The original key/value pair is not loaded as part of the graph, only the alias references are.

{
    "!": "DiceDistribution",
    "&11": [1, 1],
    "&66": [6, 6],
    "*11": 42,
    "*66": 53
}

YAML:

--- !DiceDistribution
[1, 1]: 42
[6, 6]: 53

Directives

A directive is a piece of information that gives the parser some extra information. YAML has only 2 directives:

%YAML 1.2
%TAG !foo! tag:foo.com,2009:
%TAG !bar! tag:bar.com,2010:

The %YAML directive indicates the YAML specification version used, and the TAG directive provides a way to turn tag abbreviations into fully qualified tags.

If a directive is needed in JSYNC, you wrap the entire stream with a sequence that has a special mapping as its first value. This mapping contains the directives, and it is required to have a %JSYNC key.

[
    {
        "%JSYNC": "1.0",
        "%TAG": {
            "!foo!": "tag:foo.com,2009:",
            "!bar!": "tag:bar.com,2010:"
        }
    },
    {
        "!": "foo!this",
        "some": {
            "!": "bar!that",
            "thing": "borrowed"
        }
    }
]

YAML:

%YAML 1.2
%TAG !foo! tag:foo.com,2009:
%TAG !bar! tag:bar.com,2010:
--- !foo!this
some: !bar!that
  thing: borrowed

Multiple Node Streams

Encoding zero or more top level nodes in JSYNC, uses the same wrapping mechanism described in the previous section. After the first special mapping in the sequence, each subsequent element represents a top level node.

[
    {"%JSYNC":"1.0"},
    {"!": "Message", "text": "Hello there"},
    {"!": "Message", "text": "O HAI"},
    {"!": "Message", "text": "KTHXBAI"}
]

YAML:

--- !Message
text: Hello there
--- !Message
text: O HAI
--- !Message
text: KTHXBAI

A minimal JSYNC serialization of zero top level nodes would be:

[{"%JSYNC":"1.0"}]

Top Level Scalars

JSYNC can further this wrapper notion by allowing top level scalars to be serialized. This is not possible in JSON.

[
    {"%JSYNC":"1.0"},
    "!Quote A rose by any other name would smell as sweet."
]

YAML:

--- !Quote A rose by any other name would smell as sweet.

Of course, multiple top level scalar nodes, or any combination of top level mappings, sequences and/or scalars is allowed.


JSYNC Syntax

Every JSYNC stream must be valid JSON syntax. JSYNC uses all of the JSON syntax, and adds nothing to it.

JSYNC adds extra information to mappings, sequences and scalars, using 3 simple and similar techniques. These techniques will be covered separately in the following 3 sections.

JSYNC Mappings

JSYNC adds extra information to JSON mappings by using 2 special mapping keys: "!" (for tag) and "&" (for anchor).

Note

To use these strings as actual keys, see JSYNC Escaping below.

For example:

{
    "!": "Fruit",
    "&": "001",
    "name": "apple",
    "color": "red"
}

YAML:

--- !Fruit &001
name: apple
color: red

JSYNC Sequences

JSYNC adds extra information to JSON sequences by using a special string in the first position of the sequence. The string must contain a tag, an anchor, or both (separated by a single space).

For example:

[
    "!Groceries &002",
    "Bread",
    "Milk",
    "Orange Juice"
]

YAML:

--- !Groceries &002
- Bread
- Milk
- Orange Juice

JSYNC Scalars

JSYNC adds extra information to JSON scalars by prepending a tag or an anchor (or both), each followed by a single space, to the start of a string scalar.

For example:

[
    "!Fruit apple",
    "!Fruit pear",
    "!Vegetable carrot",
    "!null "
]

YAML:

- !Fruit apple
- !Fruit pear
- !Vegetable carrot
- !null ''

Note

When tagging an empty string, a space is still required after the tag.

Note

In practice, programming languages do not care whether or not equivalent scalars are actually identical. Therefore, anchors are not typically used with scalar values.

JSYNC Escaping

JSYNC does not reserve any strings as special. In other words, you can serialize any scalar value in JSYNC. In order to distinguish literal text from their JSYNC equivalents, a "." is used as a prefix. Any string beginning with a "!", "&", "%", "*" (or contiguous "." characters followed by one of those four) must add a period to the start, on serialization. During deserialization, a starting period followed by one of those sequences is removed. A starting period not followed by one of those, is not removed.

Consider this YAML:

--- !T1 &A1
"!": .! .! .!
"&": ...&hmm
"%": ".1"
.: ...
"*A1": *A1

The equivalent JSYNC would be:

{
    "!": "T1",
    "&": "A1",
    ".!": "..! .! .!",
    ".&": "....&hmm",
    ".%": ".1",
    ".": "...",
    ".*A1": "*A1"
}

JSYNC API Specification

Note

This section is under heavy construction at the moment. Don’t take it too seriously yet.

It is strongly encouraged that all JSYNC implementations use the same API. This has been an adoption barrier in both YAML and JSON. To that end, the following API is suggested.

The JSYNC Processor Stack

A complete YAML implementation has the following processing stack. A complete JSYNC implementation could have the same.

Loader Stack        Memory Representation         Dumper Stack
==============================================================
      Loader                                      Dumper
               \                              /
                    (Native Data/Objects)
               /                              \
 Constructor                                      Representer
               \                              /
                     (Generic Node Graph)
               /                              \
    Composer      <-->     Resolver     <-->      Serializer
               \                              /
                         (Event Tree)
               /                              \
      Parser                                      Emitter
               \                              /
                        (Token Stream)
               /                              /
     Scanner
               \                              /
                      (Character Stream)
               /                              \
      Reader                                      Writer
               \                              /
                   (String or File Handle)

Typically a JSON implementation will simply have an encode() and decode() function set. The first implements Dumper→Writer and the other implements Reader→Loader, all in one atomic operation. This is very simple, although it prevents a lot of useful things being done in between ends.


Implementation Guide

Note

This section is under heavy construction at the moment. Don’t take it too seriously yet.

This section describes how to turn an existing JSON implementation into a JSYNC one.

…under construction…


Last updated 2010-11-13 21:40:56 PDT