Skip to main content

Ingesting Data in Schemaless Mode

In IoT applications, to achieve functions such as automated management, business analysis, and device monitoring, it is often necessary to collect a large number of data items. However, due to reasons such as application logic upgrades and hardware adjustments of the devices themselves, the data collection items may change frequently. To address this challenge, TDengine provides a schemaless writing method, aimed at simplifying the data recording process.

With the schemaless writing method, users do not need to create supertables or subtables in advance, as TDengine will automatically create the corresponding storage structures based on the actual data written. Additionally, when necessary, the schemaless writing method can also automatically add necessary data columns or tag columns to ensure that the data written by users is correctly stored.

It is worth noting that the supertables and their corresponding subtables created through the schemaless writing method have no functional differences from those created directly through SQL. Users can still use SQL to write data directly into them. However, since the table names generated by the schemaless writing method are based on tag values according to a fixed mapping rule, these table names may lack readability and are not easy to understand.

When using the schemaless writing method, tables are created automatically, and manual creation of tables may lead to unknown errors.

Schemaless Writing Line Protocol

TDengine's schemaless writing line protocol is compatible with InfluxDB's line protocol, OpenTSDB's telnet line protocol, and OpenTSDB's JSON format protocol. For the standard writing protocols of InfluxDB and OpenTSDB, please refer to their respective official documentation.

Below, we first introduce the protocol content extended by TDengine based on InfluxDB's line protocol. This protocol allows users to control the (supertable) schema in a more detailed manner. Using a string to express a data row, multiple rows of strings can be passed into the writing API at once to achieve batch writing of multiple data rows, with the format specified as follows.

measurement,tag_set field_set timestamp

The parameters are explained as follows.

  • measurement is the table name, separated by a comma from tag_set.
  • tag_set is formatted as <tag_key>=<tag_value>, <tag_key>=<tag_value>, representing tag column data, separated by commas, and separated by a space from field_set.
  • field_set is formatted as <field_key>=<field_value>, <field_key>=<field_value>, representing ordinary columns, also separated by commas, and separated by a space from timestamp.
  • timestamp is the primary key timestamp for this row of data.
  • Schemaless writing does not support writing data for tables with a second primary key column.

All data in tag_set are automatically converted to nchar data type and do not need to use double quotes. In the schemaless writing line protocol, each data item in field_set needs to describe its own data type, with specific requirements as follows.

  • If enclosed in double quotes, it represents varchar type, e.g., "abc".
  • If enclosed in double quotes and prefixed with L or l, it represents nchar type, e.g., L" error message ".
  • If enclosed in double quotes and prefixed with G or g, it represents geometry type, e.g., G"Point(4.343 89.342)".
  • If enclosed in double quotes and prefixed with B or b, it represents varbinary type, the double quotes can contain hexadecimal starting with \x or strings, e.g., B"\x98f46e" and B"hello".
  • For spaces, equal signs (=), commas (,), double quotes ("), and backslashes (), a backslash () is needed for escaping (all in half-width English characters). The domain escape rules for the schemaless writing protocol are shown in the following table.
NumberFieldCharacters to Escape
1Supertable namecomma, space
2Tag namecomma, equal sign, space
3Tag valuecomma, equal sign, space
4Column namecomma, equal sign, space
5Column valuedouble quotes, backslash

If two consecutive backslashes are used, the first backslash acts as an escape character; if there is only one backslash, no escape is needed. The backslash escape rules for the schemaless writing protocol are shown in the following table.

NumberBackslashEscapes to
1\\
2\\\
3\\\\\
4\\\\\\
5\\\\\\\\
6\\\\\\\\\

Numeric types are distinguished by suffixes. The escape rules for numeric types in the schema-less write protocol are shown in the following table.

NumberSuffixMapped TypeSize (Bytes)
1None or f64double8
2f32float4
3i8/u8TinyInt/UTinyInt1
4i16/u16SmallInt/USmallInt2
5i32/u32Int/UInt4
6i64/i/u64/uBigInt/BigInt/UBigInt/UBigInt8
  • t, T, true, True, TRUE, f, F, false, False will be directly treated as BOOL type.

For example, the following data line indicates: under the supertable named st, a subtable with tags t1 as "3" (NCHAR), t2 as "4" (NCHAR), t3 as "t3" (NCHAR), writing a row of data with column c1 as 3 (BIGINT), c2 as false (BOOL), c3 as "passit" (BINARY), c4 as 4 (DOUBLE), and the primary timestamp as 1626006833639000000.

st,t1=3,t2=4,t3=t3 c1=3i64,c3="passit",c2=false,c4=4f64 1626006833639000000

Note that if there is a case error in describing the data type suffix or the data type specified for the data is incorrect, it may trigger an error message and cause data writing to fail.

TDengine provides idempotence for data writing, meaning you can repeatedly call the API to write data that failed previously. However, it does not provide atomicity for writing multiple rows of data. That is, during the batch writing process of multiple rows of data, some data may be written successfully while others may fail.

Schema-less Write Handling Rules

Schema-less writes handle row data according to the following principles:

  1. The subtable name is generated using the following rules: first, combine the measurement name with the tag's key and value into the following string:

    "measurement,tag_key1=tag_value1,tag_key2=tag_value2"
    • Note that tag_key1, tag_key2 are not in the original order entered by the user, but are sorted in ascending order by tag name. Therefore, tag_key1 is not the first tag entered in the line protocol. After sorting, calculate the MD5 hash value "md5_val" of this string. Then combine the calculated result with the string to generate the table name: "t_md5_val". The "t_" is a fixed prefix, and each table automatically generated through this mapping relationship has this prefix.

    • If you do not want to use the automatically generated table name, there are two ways to specify the subtable name (the first method has higher priority).

    1. By configuring the smlAutoChildTableNameDelimiter parameter in taos.cfg (excluding @ # space CR LF tab).
      1. For example: configure smlAutoChildTableNameDelimiter=- and insert data as st,t0=cpu1,t1=4 c1=3 1626006833639000000, the created table name would be cpu1-4.
    2. By configuring the smlChildTableName parameter in taos.cfg.
      1. For example: configure smlChildTableName=tname and insert data as st,tname=cpu1,t1=4 c1=3 1626006833639000000, the created table name would be cpu1. Note that if multiple rows of data have the same tname but different tag_sets, the tag_set specified during the first automatic table creation is used, and other rows will ignore it.
  2. If the supertable obtained from parsing the line protocol does not exist, it will be created (it is not recommended to manually create supertables, otherwise data insertion may be abnormal).

  3. If the subtable obtained from parsing the line protocol does not exist, Schemaless will create the subtable according to the subtable name determined in step 1 or 2.

  4. If the tag columns or regular columns specified in the data row do not exist, they will be added to the supertable (only additions, no deletions).

  5. If some tag columns or regular columns exist in the supertable but are not specified in a data row, their values will be set to NULL in that row.

  6. For BINARY or NCHAR columns, if the length of the values provided in the data row exceeds the limit of the column type, the maximum character storage limit of the column will be automatically increased (only additions, no deletions) to ensure the complete storage of data.

  7. Errors encountered during the entire processing process will interrupt the writing process and return an error code.

  8. To improve writing efficiency, it is assumed by default that the order of the field_set in the same supertable is the same (the first data contains all fields, and subsequent data follow this order). If the order is different, configure the smlDataFormat parameter to false. Otherwise, data will be written in the same order, and the data in the database will be abnormal. Starting from version 3.0.3.0, it automatically checks whether the order is consistent, and this configuration is deprecated.

  9. Since SQL table creation does not support dots (.), Schemaless also processes dots (.) in automatically created table names, replacing them with underscores (). If the subtable name is manually specified and contains a dot (.), it will also be converted to an underscore ().

  10. taos.cfg adds the smlTsDefaultName configuration (value as a string), which only works on the client side. After configuration, the time column name for Schemaless automatic table creation can be set through this configuration. If not configured, the default is _ts.

  11. The supertable or subtable names in schema-less writing are case-sensitive.

  12. Schema-less writing still follows TDengine's underlying restrictions on data structures, such as the total length of each row of data cannot exceed 48KB (from version 3.0.5.0 it is 64KB), and the total length of tag values cannot exceed 16KB.

Time Resolution Recognition

Schema-less writing supports three specified modes, as shown in the table below:

NumberValueDescription
1SML_LINE_PROTOCOLInfluxDB Line Protocol
2SML_TELNET_PROTOCOLOpenTSDB Text Line Protocol
3SML_JSON_PROTOCOLJSON Protocol Format

In the SML_LINE_PROTOCOL parsing mode, users need to specify the time resolution of the input timestamp. The available time resolutions are as follows:

NumberTime Resolution DefinitionMeaning
1TSDB_SML_TIMESTAMP_NOT_CONFIGUREDUndefined (invalid)
2TSDB_SML_TIMESTAMP_HOURSHours
3TSDB_SML_TIMESTAMP_MINUTESMinutes
4TSDB_SML_TIMESTAMP_SECONDSSeconds
5TSDB_SML_TIMESTAMP_MILLI_SECONDSMilliseconds
6TSDB_SML_TIMESTAMP_MICRO_SECONDSMicroseconds
7TSDB_SML_TIMESTAMP_NANO_SECONDSNanoseconds

In the SML_TELNET_PROTOCOL and SML_JSON_PROTOCOL modes, the time precision is determined by the length of the timestamp (consistent with the standard operation of OpenTSDB), and the user-specified time resolution will be ignored.

Data Mode Mapping Rules

Data from the InfluxDB line protocol will be mapped to schema-based data, where the measurement maps to the supertable name, tag names in the tag_set map to tag names in the data schema, and names in the field_set map to column names. For example, the following data.

st,t1=3,t2=4,t3=t3 c1=3i64,c3="passit",c2=false,c4=4f64 1626006833639000000

This line of data maps to create a supertable: st, which includes 3 tags of type nchar: t1, t2, t3. Five data columns, namely ts (timestamp), c1 (bigint), c3 (binary), c2 (bool), c4 (bigint). Mapped into the following SQL statement:

create stable st (_ts timestamp, c1 bigint, c2 bool, c3 binary(6), c4 bigint) tags(t1 nchar(1), t2 nchar(1), t3 nchar(2))

Data Mode Change Handling

This section will explain the impact on the data schema under different line data writing scenarios.

When using line protocol to write a field type with a clear identifier, subsequent changes to the field type definition will result in a clear data schema error, triggering the write API to report an error. As shown below,

st,t1=3,t2=4,t3=t3 c1=3i64,c3="passit",c2=false,c4=4    1626006833639000000
st,t1=3,t2=4,t3=t3 c1=3i64,c3="passit",c2=false,c4=4i 1626006833640000000

The data type mapping of the first line defines the c4 column as Double, but the second line declares the column as BigInt through a numeric suffix, thus triggering a parsing error in schema-less writing.

If the line protocol in the previous rows declares a data column as binary, and subsequent requirements for a longer binary length, this will trigger a change in the supertable schema.

st,t1=3,t2=4,t3=t3 c1=3i64,c5="pass"     1626006833639000000
st,t1=3,t2=4,t3=t3 c1=3i64,c5="passit" 1626006833640000000

The line protocol parsing in the first line declares that column c5 is a binary(4) field. The second line of data writing extracts that column c5 is still a binary column, but its width is 6. At this point, the width of the binary needs to be increased to accommodate the new string.

st,t1=3,t2=4,t3=t3 c1=3i64               1626006833639000000
st,t1=3,t2=4,t3=t3 c1=3i64,c6="passit" 1626006833640000000

The second line of data adds a column c6 relative to the first line, with a type of binary(6). Thus, a column c6, type binary(6), will be automatically added.

Schemaless Writing Example

Below, using smart meters as an example, we introduce code samples for writing data using the schemaless writing interface with various language connectors. This includes three protocols: InfluxDB's line protocol, OpenTSDB's TELNET line protocol, and OpenTSDB's JSON format protocol.

note
  • Since the rules for automatic table creation with schemaless writing differ from those in the previous SQL examples, please ensure that the meters, metric_telnet, and metric_json tables do not exist before running the code samples.
  • OpenTSDB's TELNET line protocol and OpenTSDB's JSON format protocol only support one data column, so we have used other examples.

WebSocket Connection

public class SchemalessWsTest {
private static final String host = "127.0.0.1";
private static final String lineDemo = "meters,groupid=2,location=California.SanFrancisco current=10.3000002f64,voltage=219i32,phase=0.31f64 1626006833639";
private static final String telnetDemo = "metric_telnet 1707095283260 4 host=host0 interface=eth0";
private static final String jsonDemo = "{\"metric\": \"metric_json\",\"timestamp\": 1626846400,\"value\": 10.3, \"tags\": {\"groupid\": 2, \"location\": \"California.SanFrancisco\", \"id\": \"d1001\"}}";

public static void main(String[] args) throws SQLException {
final String url = "jdbc:TAOS-WS://" + host + ":6041?user=root&password=taosdata";
try (Connection connection = DriverManager.getConnection(url)) {
init(connection);
AbstractConnection conn = connection.unwrap(AbstractConnection.class);

conn.write(lineDemo, SchemalessProtocolType.LINE, SchemalessTimestampType.MILLI_SECONDS);
conn.write(telnetDemo, SchemalessProtocolType.TELNET, SchemalessTimestampType.MILLI_SECONDS);
conn.write(jsonDemo, SchemalessProtocolType.JSON, SchemalessTimestampType.SECONDS);
System.out.println("Inserted data with schemaless successfully.");
} catch (Exception ex) {
// please refer to the JDBC specifications for detailed exceptions info
System.out.printf("Failed to insert data with schemaless, %sErrMessage: %s%n",
ex instanceof SQLException ? "ErrCode: " + ((SQLException) ex).getErrorCode() + ", " : "",
ex.getMessage());
// Print stack trace for context in examples. Use logging in production.
ex.printStackTrace();
throw ex;
}
}

private static void init(Connection connection) throws SQLException {
try (Statement stmt = connection.createStatement()) {
stmt.execute("CREATE DATABASE IF NOT EXISTS power");
stmt.execute("USE power");
}
}
}

view source code

Execute schemaless writing with reqId, where the last parameter reqId can be used for request link tracing.

writer.write(lineDemo, SchemalessProtocolType.LINE, SchemalessTimestampType.NANO_SECONDS, 1L);

Native Connection

public class SchemalessJniTest {
private static final String host = "127.0.0.1";
private static final String lineDemo = "meters,groupid=2,location=California.SanFrancisco current=10.3000002f64,voltage=219i32,phase=0.31f64 1626006833639";
private static final String telnetDemo = "metric_telnet 1707095283260 4 host=host0 interface=eth0";
private static final String jsonDemo = "{\"metric\": \"metric_json\",\"timestamp\": 1626846400,\"value\": 10.3, \"tags\": {\"groupid\": 2, \"location\": \"California.SanFrancisco\", \"id\": \"d1001\"}}";

public static void main(String[] args) throws SQLException {
final String jdbcUrl = "jdbc:TAOS://" + host + ":6030/?user=root&password=taosdata";
try (Connection connection = DriverManager.getConnection(jdbcUrl)) {
init(connection);
AbstractConnection conn = connection.unwrap(AbstractConnection.class);

conn.write(lineDemo, SchemalessProtocolType.LINE, SchemalessTimestampType.MILLI_SECONDS);
conn.write(telnetDemo, SchemalessProtocolType.TELNET, SchemalessTimestampType.MILLI_SECONDS);
conn.write(jsonDemo, SchemalessProtocolType.JSON, SchemalessTimestampType.NOT_CONFIGURED);
System.out.println("Inserted data with schemaless successfully.");
} catch (Exception ex) {
// please refer to the JDBC specifications for detailed exceptions info
System.out.printf("Failed to insert data with schemaless, %sErrMessage: %s%n",
ex instanceof SQLException ? "ErrCode: " + ((SQLException) ex).getErrorCode() + ", " : "",
ex.getMessage());
// Print stack trace for context in examples. Use logging in production.
ex.printStackTrace();
throw ex;
}
}

private static void init(Connection connection) throws SQLException {
try (Statement stmt = connection.createStatement()) {
stmt.execute("CREATE DATABASE IF NOT EXISTS power");
stmt.execute("USE power");
}
}
}

view source code

Execute schemaless writing with reqId, where the last parameter reqId can be used for request link tracing.

writer.write(lineDemo, SchemalessProtocolType.LINE, SchemalessTimestampType.NANO_SECONDS, 1L);

Querying the Written Data

By running the example code from the previous section, tables will be automatically created in the power database. We can query the data using taos shell or an application. Below is an example of querying the data from the supertable and meters table using taos shell.

taos> show power.stables;
stable_name |
=================================
meter_current |
stb0_0 |
meters |
Query OK, 3 row(s) in set (0.002527s)



taos> select * from power.meters limit 1 \G;
*************************** 1.row ***************************
_ts: 2021-07-11 20:33:53.639
current: 10.300000199999999
voltage: 219
phase: 0.310000000000000
groupid: 2
location: California.SanFrancisco
Query OK, 1 row(s) in set (0.004501s)