OML Measurement Stream Protocol (OMSP) Specification¶
This document is a work in progress aiming at properly specifying the OML Measurement Stream Protocol (OMSP) in its various flavours.
Generalities¶
The protocol is loosely modelled after HTTP. The client first start with a few textual headers, then switches into either the text or binary protocol for the serialisation of tuples, following a previously advertised schema. Both modes include contextual information with each tuple. There is no feedback communication from the server.
The client first opens a connection to an OML server and sends a header followed by a sequence of measurement tuples. The header consists of a sequence of key/value pairs representing parameters valid for the whole connection, each terminated by a new line. The headers are also used to declare the schema following which measurement tuples will be streamed. The end of the header section is identified by an empty line. Each measurement tuple is then serialised following the mode selected in the headers. For the text mode, this is a series of newline-terminated strings containing tab-separated text-based serialisations of each element, while the binary mode encodes the data following a specific marshalling. Clients end the session by simply closing the TCP connection.
There are 3 versions of the OML protocol. They are currently backward compatible.- OMSP V1 was the initial protocol, inherited from OML (version 1!);
- OMSP V2 introduced more precise types (28daef3f), and was release with OML 2.4.0;
- OMSP V3 introduced changes to the binary protocol to support blobs and, incidentally, longer marshalled packets (6d8f0597), and was released with OML 2.5.0;
- OMSP V4 is currently in development, and anything referring to it in this document is not stable yet. Versions 1 to 3, however, are.
Schema Definition¶
Schemas describe the name, type and order of the values defining a sample in a measurement stream.
Schema declarations are a space-delimited concatenation sequence of name/type pairs. The name and type in each pair are separated by a colon :.
- int32 (V>=1)
- uint32 (V>=2)
- int64 (V>=2)
- uint64 (V>=2)
- double (V>=2)
- string
- blob (V>=3)
- guid (V>=4)
- bool (V>=4)
- int (V<2, mapped to int32 in V>=3)
- integer (V<2, mapped to int32 in V>=3)
- long (V<2, clamped and mapped to int32 in V>=3)
- float (V<2, mapped to double in V>=3)
A full schema also has a name, prepended to its definition and separated by a space. This must consist of only alpha-numeric characters and underscores and must start with a letter or an underscore, i.e., matching /[_A-Za-z][_A-Za-z0-9]/. The same rule applies to the names of the elements of the schema.
Each client should number its measurement streams sequentially starting from 1 (not 0), and prepend that number to their schema definition. It will later be used to label tuples following this schema, and allow to group them together in the storage backend.
Example¶
1 generator_sin label:string phase:double value:double 2 generator_lin label:string counter:long
Schema 0 (OMSP V>=4)¶
Schema 0 is a specific hard-coded stream for metadata. Its core elements are two fields, named key and value. Data from this stream is stored in the same way as any other data, but its semantic is different in that it only describes and adds information about other measurement streams. Metadata follows an Subject-Key-Value model where the key/value pair is an attribute of a specific subject. Subjects are expressed in dotted notation. The default subject, ., is the experiment itself. At the second level are schemas, and their fields at the third level (e.g., .a refers to all of schema a, while .a.f refers only to its field f).
To support this, schema 0 is therefore:
0 _experiment_metadata subject:string key:string value:string
On the server side, everything gets stored in the _experiment_metadata table. However, additional processing might happen. For example, if key schema is defined for subject . (the experiment root), a new schema is defined at the collection point so new MSs can be sent.
In case of reconnection, it is up to the client SHOULD re-send these headers or not. This is particularly relevant if a new schema was defined later on. The server MAY store duplicate metadata if this happens.
Time-stamping and book-keeping¶
Prior to serialising tuples according to their schema, three elements are inserted.- timestamp: a double timestamp in seconds relative to the
start-timesent in the headers; - stream_id: an integer (marshalled specifically as a
uint8_tin binary mode) indicating which previously defined schema this tuple follows; - seq_no: an int32 monotonically increasing sequence number in the context of this measurement stream.
The order of these fields varies depending on the mode (text or binary).
Key/Value Parameters¶
The connection is initially configured through setting the value of a few property, using a key/value model. The properties (and their keys) are the following.- protocol: OMSP version, as specified in this document. The
oml2-servercurrently supports 1--4; - domain (experiment-id in V<4): string identifying the experimental group (should match
/[-_A-Za-z0-9]+/); - start-time: local UNIX time in seconds taken at the time the header is being sent (see gettimeofday );
- sender-id (start_time in V<4): string identifying the source of this stream (should match
/[_A-Za-z0-9]+/); - app-name: string identifying the application producing the measurements (should match
/[_A-Za-z0-9]+/), in the storage backend, this may be used to identify specific measurements collections (e.g., tables in SQL); - content: encoding of forthcoming tuples, can be either
binaryfor the binary protocol ortextfor the text protocol. - schema: describes the schema of each measurement stream, as detailed previously.
- These parameters can only be set as part of the headers, and are not valid once the server expects serialised measurements (V<4).
- Since V>=4, key/value metadata can be sent along with tuples using the schema 0 defined before, the key/value parameters presented here are all invalid in schema 0, and will be rejected by the server, except for key
schemaitself, allowing to (re)define schemata (XXX including schema 0?).
Information storage¶
This section is only informative and describes the mapping from OMSP elements to database storage.
With the current SQL backends, the information is used or stored as follows (V<4; OML<2.10).
- The
protocolandcontentkeys are specific to the protocol and never appear in the backend storage; - The
domainis used to group measurements together (i.e., in the same database with that name); - The
start-timeof the earliest client (with some offset towards the past) is saved as a key/value pair in the_experiment_metadatatable; - The
sender-idis associated to a unique integer by the server. This mapping is stored in the_senderstable, and reused to label tuples originating from this sender in other tables (oml_sender_idcolumn); TODO can this be wrapped as metadata? Maybe not... - The
app-nameis used to name tables from a specific application by prepending it to the name of the Measurement Point (e.g,APPNAME_MPNAME); XXX This is actually done on the client side,app-nameis never used by the collection point TODO Maybe we should only use the measurement point name, and store thesender-id/app-nametogether. - The
schemaare used to define new tables to store measurement tuples, named as per the previous scheme; it is also stored in the_experement_metadatatable. These tables contain at least the following columns:ida primary key for the table, each row has a different, monotonically increasing ID;oml_sender_idan integer which can be found in the_senderstable;oml_seqa record of theseq_nosent with each tuple;oml_ts_clientthe offset fromstart-timeof when that tuple was serialised;oml_ts_serverthe same offset rebased in the server's timeframe (by adding the difference of the server's time and thestart-timeheader upon connection from the particular sender);- Each element of the schema, in order, with a database type able to store the information of the OML type;
Protocol¶
This section describes the actual encoding of the elements described above. Key/value parameters go into the headers. Starting with V>=4, they can also use schema 0 to be sent alongside measurement streams. Then, depending on the chosen content, the text or binary mode is used for measurement tuples.
Headers¶
The header is text-based, and used to transfer the key/value parameters of the experiment, as defined earlier in this document.
All of them have to appear exactly once, in the order they were introduced in this document.
The only exception is the schema field which needs to appear once for every measurement stream carried by the connection.
Example¶
protocol: 3 experiment-id: ex1 start-time: 1281591603 sender-id: sender1 app-name: generator schema: 1 generator_sin label:string phase:double value:double schema: 2 generator_lin label:string counter:long content: text
Text Protocol¶
The text protocol is meant to simplify sourcing of measurement streams from applications written in languages which are not supported by the OML library or where the OML library is considered too heavy. It is primarily envisioned for low-volume streams which do not require additional client side filtering. There are native instrumentation (liboml2, OML4R, OML4Py) but implementing the protocol from scratch in any language of choice should be very straight forward.
The text protocol simply serialises metadata and values of a tuple as one newline-terminated (\n), tab-separated (\t) line per sample.
- All numeric types are represented as decimal strings suitable for strtod(3) and siblings ; using snprintf, with the proper
PRIuNformat if needed, should provide good functionality (at least V>=2; as of V<=3, there is no guarantee for the interpretation of non-decimal notations) - Strings are represented directly (except for the nil-terminator) but some character values require special processing;
- As the text protocol assigns special meaning to the tab and newline characters they would confused the parser if they appeared verbatim. To avoid this a simple backslash encoding is used: tab characters are represented by the string
"\t", newlines by the string"\n"and backslash itself by the string"\\"(V>=4; no other backslash expansion is made TODO what if\whateveris input?);
- As the text protocol assigns special meaning to the tab and newline characters they would confused the parser if they appeared verbatim. To avoid this a simple backslash encoding is used: tab characters are represented by the string
- BLOBs are encoded using BASE64 encoding and the resulting string is sent. No line breaks are permitted within the BASE64-encoded string (V>=4);
- GUIDs are globally unique IDs used to link different measurements. These are treated as large numbers and thus represented as UINT64, unsigned decimal strings. (V>=4);
- bools are encoded as any case-insensitive stem of FALSE or TRUE (e.g.,
fAL,trUe, but generallyFandTwill suffice), being respectively False or True; any other value is considered True, including '0' (V>=4).
Example¶
This example shows two streams, matching the schema from the headers.
TODO: Add example string and blob
0.903816 2 0 sample-1 1 0.903904 1 0 sample-1 0.000000 0.000000 1.903944 2 1 sample-2 2 1.903961 1 1 sample-2 0.628319 0.587785 2.460049 2 3 sample-3 3 2.460557 1 3 sample-3 1.256637 0.951057 3.461064 2 4 sample-4 4 3.461103 1 4 sample-4 1.884956 0.951056
Binary Protocol¶
See binary marshalling, as described in Doxygen documentation.