All Classes Namespaces Files Functions Variables Typedefs Enumerations Enumerator Friends Macros Groups Pages
Internals

This section records some design and implementation details.

Architecture

SAX and DOM

The basic relationships of SAX and DOM is shown in the following UML diagram.

architecture.png
Architecture UML class diagram

The core of the relationship is the Handler concept. From the SAX side, Reader parses a JSON from a stream and publish events to a Handler. Writer implements the Handler concept to handle the same set of events. From the DOM side, Document implements the Handler concept to build a DOM according to the events. Value supports a Value::Accept(Handler&) function, which traverses the DOM to publish events.

With this design, SAX is not dependent on DOM. Even Reader and Writer have no dependencies between them. This provides flexibility to chain event publisher and handlers. Besides, Value does not depends on SAX as well. So, in addition to stringify a DOM to JSON, user may also stringify it to a XML writer, or do anything else.

Utility Classes

Both SAX and DOM APIs depends on 3 additional concepts: Allocator, Encoding and Stream. Their inheritance hierarchy is shown as below.

utilityclass.png
Utility classes UML class diagram

Value

Value (actually a typedef of GenericValue<UTF8<>>) is the core of DOM API. This section describes the design of it.

Data Layout

Value is a variant type. In RapidJSON's context, an instance of Value can contain 1 of 6 JSON value types. This is possible by using union. Each Value contains two members: union Data data_ and aunsigned flags_. The flags_ indiciates the JSON type, and also additional information.

The following tables show the data layout of each type. The 32-bit/64-bit columns indicates the size of the field in bytes.

Null 32-bit64-bit
(unused) 4 8
(unused) 4 4
(unused) 4 4
unsigned flags_ kNullType kNullFlag 4 4
Bool 32-bit64-bit
(unused) 4 8
(unused) 4 4
(unused) 4 4
unsigned flags_ kBoolType (either kTrueFlag or kFalseFlag) 4 4
String 32-bit64-bit
Ch* str Pointer to the string (may own) 4 8
SizeType length Length of string 4 4
(unused) 4 4
unsigned flags_ kStringType kStringFlag ... 4 4
Object 32-bit64-bit
Member* members Pointer to array of members (owned) 4 8
SizeType size Number of members 4 4
SizeType capacity Capacity of members 4 4
unsigned flags_ kObjectType kObjectFlag 4 4
Array 32-bit64-bit
Value* values Pointer to array of values (owned) 4 8
SizeType size Number of values 4 4
SizeType capacity Capacity of values 4 4
unsigned flags_ kArrayType kArrayFlag 4 4
Number (Int) 32-bit64-bit
int i 32-bit signed integer 4 4
(zero padding) 0 4 4
(unused) 4 8
unsigned flags_ kNumberType kNumberFlag kIntFlag kInt64Flag ... 4 4
Number (UInt) 32-bit64-bit
unsigned u 32-bit unsigned integer 4 4
(zero padding) 0 4 4
(unused) 4 8
unsigned flags_ kNumberType kNumberFlag kUIntFlag kUInt64Flag ... 4 4
Number (Int64) 32-bit64-bit
int64_t i64 64-bit signed integer 8 8
(unused) 4 8
unsigned flags_ kNumberType kNumberFlag kInt64Flag ... 4 4
Number (Uint64) 32-bit64-bit
uint64_t i64 64-bit unsigned integer 8 8
(unused) 4 8
unsigned flags_ kNumberType kNumberFlag kInt64Flag ... 4 4
Number (Double) 32-bit64-bit
uint64_t i64 Double precision floating-point 8 8
(unused) 4 8
unsigned flags_ kNumberType kNumberFlag kDoubleFlag 4 4

Here are some notes:

  • To reduce memory consumption for 64-bit architecture, SizeType is typedef as unsigned instead of size_t.
  • Zero padding for 32-bit number may be placed after or before the actual type, according to the endianess. This makes possible for interpreting a 32-bit integer as a 64-bit integer, without any conversion.
  • An Int is always an Int64, but the converse is not always true.

Flags

The 32-bit flags_ contains both JSON type and other additional information. As shown in the above tables, each JSON type contains redundant kXXXType and kXXXFlag. This design is for optimizing the operation of testing bit-flags (IsNumber()) and obtaining a sequential number for each type (GetType()).

String has two optional flags. kCopyFlag means that the string owns a copy of the string. kInlineStrFlag means using Short-String Optimization.

Number is a bit more complicated. For normal integer values, it can contains kIntFlag, kUintFlag, kInt64Flag and/or kUint64Flag, according to the range of the integer. For numbers with fraction, and integers larger than 64-bit range, they will be stored as double with kDoubleFlag.

Short-String Optimization

Kosta provided a very neat short-string optimization. The optimization idea is given as follow. Excluding the flags_, a Value has 12 or 16 bytes (32-bit or 64-bit) for storing actual data. Instead of storing a pointer to a string, it is possible to store short strings in these space internally. For encoding with 1-byte character type (e.g. char), it can store maximum 11 or 15 characters string inside the Value type.

ShortString (Ch=char) 32-bit64-bit
Ch str[MaxChars] String buffer 11 15
Ch invLength MaxChars - Length 1 1
unsigned flags_ kStringType kStringFlag ... 4 4

A special technique is applied. Instead of storing the length of string directly, it stores (MaxChars - length). This make it possible to store 11 characters with trailing \0.

This optimization can reduce memory usage for copy-string. It can also improve cache-coherence thus improve runtime performance.