OML Measurement Stream Protocol (OMSP) Specification

This document is a work in progress aiming at properly specifying the OML Measurement Stream Protocol (OMSP) in its various flavours.

Generalities

The protocol is loosely modelled after HTTP. The client first start with a few textual headers, then switches into either the text or binary protocol for the serialisation of tuples, following a previously advertised schema. Both modes include contextual information with each tuple. There is no feedback communication from the server.

The client first opens a connection to an OML server and sends a header followed by a sequence of measurement tuples. The header consists of a sequence of key/value pairs representing parameters valid for the whole connection, each terminated by a new line. The headers are also used to declare the schema following which measurement tuples will be streamed. The end of the header section is identified by an empty line. Each measurement tuple is then serialised following the mode selected in the headers. For the text mode, this is a series of newline-terminated strings containing tab-separated text-based serialisations of each element, while the binary mode encodes the data following a specific marshalling. Clients end the session by simply closing the TCP connection.

There are 3 versions of the OML protocol. They are currently backward compatible.
  • OMSP V1 was the initial protocol, inherited from OML (version 1!);
  • OMSP V2 introduced more precise types (28daef3f), and was release with OML 2.4.0;
  • OMSP V3 introduced changes to the binary protocol to support blobs and, incidentally, longer marshalled packets (6d8f0597), and was released with OML 2.5.0;
  • OMSP V4 is currently in development, and anything referring to it in this document is not stable yet. Versions 1 to 3, however, are.

Schema Definition

Schemas describe the name, type and order of the values defining a sample in a measurement stream.

Schema declarations are a space-delimited concatenation sequence of name/type pairs. The name and type in each pair are separated by a colon :.

Valid types in OMSP the following.
  • int32 (V>=1)
  • uint32 (V>=2)
  • int64 (V>=2)
  • uint64 (V>=2)
  • double (V>=2)
  • string
  • blob (V>=3)
  • guid (V>=4)
  • bool (V>=4)
Additionally, some deprecated values are kept for backwards compatibility, and interpreted in the latest version as indicated. They should not be used in new implementations.
  • int (V<2, mapped to int32 in V>=3)
  • integer (V<2, mapped to int32 in V>=3)
  • long (V<2, clamped and mapped to int32 in V>=3)
  • float (V<2, mapped to double in V>=3)

A full schema also has a name, prepended to its definition and separated by a space. This must consist of only alpha-numeric characters and underscores and must start with a letter or an underscore, i.e., matching /[_A-Za-z][_A-Za-z0-9]/. The same rule applies to the names of the elements of the schema.

Each client should number its measurement streams sequentially starting from 1 (not 0), and prepend that number to their schema definition. It will later be used to label tuples following this schema, and allow to group them together in the storage backend.

Example

1 generator_sin label:string phase:double value:double 
2 generator_lin label:string counter:long 

Schema 0 (OMSP V>=4)

Schema 0 is a specific hard-coded stream for metadata. Its core elements are two fields, named key and value. Data from this stream is stored in the same way as any other data, but its semantic is different in that it only describes and adds information about other measurement streams. Metadata follows an Subject-Key-Value model where the key/value pair is an attribute of a specific subject. Subjects are expressed in dotted notation. The default subject, ., is the experiment itself. At the second level are schemas, and their fields at the third level (e.g., .a refers to all of schema a, while .a.f refers only to its field f).

To support this, schema 0 is therefore:

0 _experiment_metadata subject:string key:string value:string 

On the server side, everything gets stored in the _experiment_metadata table. However, additional processing might happen. For example, if key schema is defined for subject . (the experiment root), a new schema is defined at the collection point so new MSs can be sent.

In case of reconnection, it is up to the client SHOULD re-send these headers or not. This is particularly relevant if a new schema was defined later on. The server MAY store duplicate metadata if this happens.

Time-stamping and book-keeping

Prior to serialising tuples according to their schema, three elements are inserted.
  • timestamp: a double timestamp in seconds relative to the start-time sent in the headers;
  • stream_id: an integer (marshalled specifically as a uint8_t in binary mode) indicating which previously defined schema this tuple follows;
  • seq_no: an int32 monotonically increasing sequence number in the context of this measurement stream.

The order of these fields varies depending on the mode (text or binary).

Key/Value Parameters

The connection is initially configured through setting the value of a few property, using a key/value model. The properties (and their keys) are the following.
  • protocol: OMSP version, as specified in this document. The oml2-server currently supports 1--4;
  • domain (experiment-id in V<4): string identifying the experimental group (should match /[-_A-Za-z0-9]+/);
  • start-time: local UNIX time in seconds taken at the time the header is being sent (see gettimeofday );
  • sender-id (start_time in V<4): string identifying the source of this stream (should match /[_A-Za-z0-9]+/);
  • app-name: string identifying the application producing the measurements (should match /[_A-Za-z0-9]+/), in the storage backend, this may be used to identify specific measurements collections (e.g., tables in SQL);
  • content: encoding of forthcoming tuples, can be either binary for the binary protocol or text for the text protocol.
  • schema: describes the schema of each measurement stream, as detailed previously.
  • These parameters can only be set as part of the headers, and are not valid once the server expects serialised measurements (V<4).
  • Since V>=4, key/value metadata can be sent along with tuples using the schema 0 defined before, the key/value parameters presented here are all invalid in schema 0, and will be rejected by the server, except for key schema itself, allowing to (re)define schemata (XXX including schema 0?).

Information storage

This section is only informative and describes the mapping from OMSP elements to database storage.

With the current SQL backends, the information is used or stored as follows (V<4; OML<2.10).

  • The protocol and content keys are specific to the protocol and never appear in the backend storage;
  • The domain is used to group measurements together (i.e., in the same database with that name);
  • The start-time of the earliest client (with some offset towards the past) is saved as a key/value pair in the _experiment_metadata table;
  • The sender-id is associated to a unique integer by the server. This mapping is stored in the _senders table, and reused to label tuples originating from this sender in other tables (oml_sender_id column); TODO can this be wrapped as metadata? Maybe not...
  • The app-name is used to name tables from a specific application by prepending it to the name of the Measurement Point (e.g, APPNAME_MPNAME); XXX This is actually done on the client side, app-name is never used by the collection point TODO Maybe we should only use the measurement point name, and store the sender-id/app-name together.
  • The schema are used to define new tables to store measurement tuples, named as per the previous scheme; it is also stored in the _experement_metadata table. These tables contain at least the following columns:
    • id a primary key for the table, each row has a different, monotonically increasing ID;
    • oml_sender_id an integer which can be found in the _senders table;
    • oml_seq a record of the seq_no sent with each tuple;
    • oml_ts_client the offset from start-time of when that tuple was serialised;
    • oml_ts_server the same offset rebased in the server's timeframe (by adding the difference of the server's time and the start-time header upon connection from the particular sender);
    • Each element of the schema, in order, with a database type able to store the information of the OML type;

Protocol

This section describes the actual encoding of the elements described above. Key/value parameters go into the headers. Starting with V>=4, they can also use schema 0 to be sent alongside measurement streams. Then, depending on the chosen content, the text or binary mode is used for measurement tuples.

Headers

The header is text-based, and used to transfer the key/value parameters of the experiment, as defined earlier in this document.
All of them have to appear exactly once, in the order they were introduced in this document.
The only exception is the schema field which needs to appear once for every measurement stream carried by the connection.

Example

protocol: 3
experiment-id: ex1
start-time: 1281591603
sender-id: sender1
app-name: generator
schema: 1 generator_sin label:string phase:double value:double 
schema: 2 generator_lin label:string counter:long 
content: text

Text Protocol

The text protocol is meant to simplify sourcing of measurement streams from applications written in languages which are not supported by the OML library or where the OML library is considered too heavy. It is primarily envisioned for low-volume streams which do not require additional client side filtering. There are native instrumentation (liboml2, OML4R, OML4Py) but implementing the protocol from scratch in any language of choice should be very straight forward.

The text protocol simply serialises metadata and values of a tuple as one newline-terminated (\n), tab-separated (\t) line per sample.

The textual representation of the types defined above is as follows:
  • All numeric types are represented as decimal strings suitable for strtod(3) and siblings ; using snprintf, with the proper PRIuN format if needed, should provide good functionality (at least V>=2; as of V<=3, there is no guarantee for the interpretation of non-decimal notations)
  • Strings are represented directly (except for the nil-terminator) but some character values require special processing;
    • As the text protocol assigns special meaning to the tab and newline characters they would confused the parser if they appeared verbatim. To avoid this a simple backslash encoding is used: tab characters are represented by the string "\t", newlines by the string "\n" and backslash itself by the string "\\" (V>=4; no other backslash expansion is made TODO what if \whatever is input?);
  • BLOBs are encoded using BASE64 encoding and the resulting string is sent. No line breaks are permitted within the BASE64-encoded string (V>=4);
  • GUIDs are globally unique IDs used to link different measurements. These are treated as large numbers and thus represented as UINT64, unsigned decimal strings. (V>=4);
  • bools are encoded as any case-insensitive stem of FALSE or TRUE (e.g., fAL, trUe, but generally F and T will suffice), being respectively False or True; any other value is considered True, including '0' (V>=4).

Example

This example shows two streams, matching the schema from the headers.

TODO: Add example string and blob

0.903816    2    0    sample-1    1
0.903904    1    0    sample-1    0.000000    0.000000
1.903944    2    1    sample-2    2
1.903961    1    1    sample-2    0.628319    0.587785
2.460049    2    3    sample-3    3
2.460557    1    3    sample-3    1.256637    0.951057
3.461064    2    4    sample-4    4
3.461103    1    4    sample-4    1.884956    0.951056

Binary Protocol

See binary marshalling, as described in Doxygen documentation.