taosBenchmark

Introduction

taosBenchmark (formerly taosdemo ) is a tool for testing the performance of TDengine products. taosBenchmark can test the performance of TDengine's insert, query, and subscription functions and simulate large amounts of data generated by many devices. taosBenchmark can be configured to generate user defined databases, supertables, subtables, and the time series data to populate these for performance benchmarking. taosBenchmark is highly configurable and some of the configurations include the time interval for inserting data, the number of working threads and the capability to insert disordered data. The installer provides taosdemo as a soft link to taosBenchmark for compatibility with past users.

IMPORTANT

Please be noted that in the context of TDengine cloud service, non privileged user can't create database using any tool, including taosBenchmark. The database needs to be firstly created in the data explorer in TDengine cloud service console. For any content about creating database in this document, the user needs to ignore and create the database manually inside TDengine cloud service.

Installation

There are two ways to install taosBenchmark:

Installing the official TDengine installer will automatically install taosBenchmark.
Compile taos-tools separately and install them. Please refer to the taos-tools repository for details.

Run

Configuration and running methods

Run this command in your Linux terminal to save cloud DSN as variable:

export TDENGINE_CLOUD_DSN="<DSN>"

IMPORTANT

To obtain the value of cloud DSN, please log in TDengine Cloud and click "Tools" and then select "taosBenchmark".

Users can use -f <json file> to specify a configuration file.

taosBenchmark supports the complete performance testing of TDengine by providing functionally to write, query, and subscribe. These three functions are mutually exclusive, users can only select one of them each time taosBenchmark runs. The query and subscribe functionalities are only configurable using a json configuration file by specifying the parameter filetype, while write can be performed through both the command-line and a configuration file. If you want to test the performance of queries configure taosBenchmark with the configuration file. You can modify the value of the filetype parameter to specify the function that you want to test.

Make sure that the TDengine cluster is running correctly before running taosBenchmark.

Run with the configuration file

A sample configuration file is provided in the taosBenchmark installation package under <install_directory>/examples/taosbenchmark-json.

Use the following command-line to run taosBenchmark and control its behavior via a configuration file.

taosBenchmark -f json-file

Sample configuration files

Configuration file examples

{
    "filetype": "insert",
    "cfgdir": "/etc/taos",
    "connection_pool_size": 8,
    "thread_count": 4,
    "create_table_thread_count": 7,
    "result_file": "./insert_res.txt",
    "confirm_parameter_prompt": "no",
    "insert_interval": 0,
    "interlace_rows": 100,
    "num_of_records_per_req": 100,
    "prepared_rand": 10000,
    "chinese": "no",
    "databases": [
        {
            "dbinfo": {
                "name": "test",
                "drop": "no",
                "replica": 1,
                "precision": "ms",
                "keep": 3650,
                "minRows": 100,
                "maxRows": 4096,
                "comp": 2
            },
            "super_tables": [
                {
                    "name": "meters",
                    "child_table_exists": "no",
                    "childtable_count": 10000,
                    "childtable_prefix": "d",
                    "escape_character": "yes",
                    "auto_create_table": "no",
                    "batch_create_tbl_num": 5,
                    "data_source": "rand",
                    "insert_mode": "taosc",
                    "non_stop_mode": "no",
                    "line_protocol": "line",
                    "insert_rows": 10000,
                    "childtable_limit": 10,
                    "childtable_offset": 100,
                    "interlace_rows": 0,
                    "insert_interval": 0,
                    "partial_col_num": 0,
                    "disorder_ratio": 0,
                    "disorder_range": 1000,
                    "timestamp_step": 10,
                    "start_timestamp": "2020-10-01 00:00:00.000",
                    "sample_format": "csv",
                    "sample_file": "./sample.csv",
                    "use_sample_ts": "no",
                    "tags_file": "",
                    "columns": [
                        {
                            "type": "FLOAT",
                            "name": "current",
                            "count": 1,
                            "max": 12,
                            "min": 8
                        },
                        { "type": "INT", "name": "voltage", "max": 225, "min": 215 },
                        { "type": "FLOAT", "name": "phase", "max": 1, "min": 0 }
                    ],
                    "tags": [
                        {
                            "type": "TINYINT",
                            "name": "groupid",
                            "max": 10,
                            "min": 1
                        },
                        {
                            "name": "location",
                            "type": "BINARY",
                            "len": 16,
                            "values": ["San Francisco", "Los Angles", "San Diego",
                                "San Jose", "Palo Alto", "Campbell", "Mountain View",
                                "Sunnyvale", "Santa Clara", "Cupertino"]
                        }
                    ]
                }
            ]
        }
    ]
}

Configuration file parameters in detailed

General configuration parameters

The parameters listed in this section apply to all function modes.

filetype : The function to be tested, with optional values insert, query. These correspond to the insert and query, respectively. Users can specify only one of these in each configuration file. cfgdir: specify the TDengine cluster configuration file's directory. The default path is /etc/taos.
host: Specify the FQDN of the TDengine server to connect. The default value is localhost.
port: The port number of the TDengine server to connect to, the default value is 6030.
user: The user name of the TDengine server to connect to, the default is root.
password: The password to connect to the TDengine server, the default value is taosdata.

Insert scenario configuration parameters

filetype must be set to insert in the insertion scenario. See [General Configuration Parameters](#General Configuration Parameters)

keep_trying : Keep trying if failed to insert, default is no. Available with v3.0.9+.
trying_interval : Specify interval between keep trying insert. Valid value is a postive number. Only valid when keep trying be enabled. Available with v3.0.9+.

The parameters for creating streams are configured in stream in the json configuration file, as shown below.

stream_name: Name of the stream. Mandatory.
stream_stb: Name of the supertable for the stream. Mandatory.
stream_sql: SQL statement for the stream to process. Mandatory.
trigger_mode: Triggering mode for stream processing. Optional.
watermark: Watermark for stream processing. Optional.
drop: Whether to create the stream. Specify yes to create the stream or no to not create the stream.

The parameters for creating super tables are configured in super_tables in the json configuration file, as shown below.

name: Super table name, mandatory, no default value.
child_table_exists : whether the child table already exists, default value is "no", optional value is "yes" or "no".
child_table_count : The number of child tables, the default value is 10.
child_table_prefix : The prefix of the child table name, mandatory configuration item, no default value.
escape_character: specify the super table and child table names containing escape characters. The value can be "yes" or "no". The default is "no".
auto_create_table: only when insert_mode is taosc, rest, stmt, and childtable_exists is "no". "yes" means taosBenchmark will automatically create non-existent tables when inserting data; "no" means that taosBenchmark will create all tables before inserting.
batch_create_tbl_num : the number of tables per batch when creating sub-tables, default is 10. Note: the actual number of batches may not be the same as this value. If the executed SQL statement is larger than the maximum length supported, it will be automatically truncated and re-executed to continue creating.
data_source: specify the source of data-generation. Default is taosBenchmark randomly generated. Users can configure it as "rand" and "sample". When "sample" is used, taosBenchmark will use the data in the file specified by the sample_file parameter.
insert_mode: insertion mode with options taosc, rest, stmt, sml, sml-rest, corresponding to normal write, restful interface write, parameter binding interface write, schemaless interface write, restful schemaless interface write (provided by taosAdapter). The default value is taosc.
non_stop_mode: Specify whether to keep writing. If "yes", insert_rows will be disabled, and writing will not stop until Ctrl + C stops the program. The default value is "no", i.e., taosBenchmark will stop the writing after the specified number of rows are written. Note: insert_rows must be configured as a non-zero positive integer even if it fails in continuous write mode.
line_protocol: Insert data using line protocol. Only works when insert_mode is sml or sml-rest. The value can be line, telnet, or json.
tcp_transfer: Communication protocol in telnet mode only takes effect when insert_mode is sml-rest, and line_protocol is telnet. If not configured, the default protocol is http.
insert_rows : The number of inserted rows per child table, default is 0.
childtable_offset: Effective only if childtable_exists is yes, specifies the offset when fetching the list of child tables from the super table, i.e., starting from the first child table.
childtable_limit: Effective only when childtable_exists is yes, specifies the upper limit for fetching the list of child tables from the super table.
interlace_rows: Enables interleaved insertion mode and specifies the number of rows of data to be inserted into each child table at a time. Staggered insertion mode means inserting the number of rows specified by this parameter into each sub-table and repeating the process until all sub-tables have been inserted. The default value is 0, i.e., data is inserted into one sub-table before the next sub-table is inserted.
insert_interval : Specifies the insertion interval in ms for interleaved insertion mode. The default value is 0. It only works if -B/--interlace-rows is greater than 0. After inserting interlaced rows for each child table, the data insertion thread will wait for the interval specified by this value before proceeding to the next round of writes.
partial_col_num: If this value is a positive number n, only the first n columns are written to, only if insert_mode is taosc and rest, or all columns if n is 0.
disorder_ratio : Specifies the percentage probability of disordered (i.e. out-of-order) data in the value range [0,50]. The default is 0, which means there is no disorder data.
disorder_range : Specifies the timestamp fallback range for the disordered data. The disordered timestamp is generated by subtracting a random value in this range, from the timestamp that would be used in the non-disorder case. Valid only if the percentage of disordered data specified by -O/--disorder is greater than 0.
timestamp_step: The timestamp step for inserting data in each child table, in units consistent with the precision of the database. For e.g. if the precision is milliseconds, the timestamp step will be in milliseconds. The default value is 1.
start_timestamp : The timestamp start value of each sub-table, the default value is now.
sample_format: The type of the sample data file; for now only "csv" is supported.
sample_file: Specify a CSV format file as the data source. It only works when data_source is a sample. If the number of rows in the CSV file is less than or equal to prepared_rand, then taosBenchmark will read the CSV file data cyclically until it is the same as prepared_rand; otherwise, taosBenchmark will read only the rows with the number of prepared_rand. The final number of rows of data generated is the smaller of the two.
use_sample_ts: effective only when data_source is sample, indicates whether the CSV file specified by sample_file contains the first timestamp column. Default is no. If set to yes, the first column of the CSV file is used as timestamp. Since the timestamp of the same sub-table cannot be repeated, the amount of data generated depends on the same number of rows of data in the CSV file, and insert_rows will be invalidated.
tags_file : only works when insert_mode is taosc, rest. The final tag value is related to the childtable_count. Suppose the tag data rows in the CSV file are smaller than the given number of child tables. In that case, taosBenchmark will read the CSV file data cyclically until the number of child tables specified by childtable_count is generated. Otherwise, taosBenchmark will read the childtable_count rows of tag data only. The final number of child tables generated is the smaller of the two.

TSMA configuration parameters

The configuration parameters for specifying TSMAs are in tsmas in super_tables.

name: Specifies TSMA name. Mandatory.
function: Specifies TSMA function. Mandatory.
interval: Specifies TSMA interval. Mandatory.
sliding: Specifies time offset for TSMA window. Mandatory.
custom: Specifies custom configurations to attach to the end of the TSMA creation statement. Optional.
start_when_inserted: Specifies the number of inserted rows after which TSMA is started. Optional. The default value is 0.

Tag and Data Column Configuration Parameters

The configuration parameters for specifying super table tag columns and data columns are in columns and tag in super_tables, respectively.

type: Specify the column type. For optional values, please refer to the data types supported by TDengine. Note: JSON data type is unique and can only be used for tags. When using JSON type as a tag, there is and can only be this one tag. At this time, count and len represent the meaning of the number of key-value pairs within the JSON tag and the length of the value of each KV pair. Respectively, the value is a string by default.
len: Specifies the length of this data type, valid for NCHAR, BINARY, and JSON data types. If this parameter is configured for other data types, a value of 0 means that the column is always written with a null value; if it is not 0, it is ignored.
count: Specifies the number of consecutive occurrences of the column type, e.g., "count": 4096 generates 4096 columns of the specified type.
name : The name of the column, if used together with count, e.g. "name": "current", "count":3, then the names of the 3 columns are current, current_2. current_3.
min: The minimum value of the column/label of the data type.
max: The maximum value of the column/label of the data type.
values: The value field of the nchar/binary column/label, which will be chosen randomly from the values.
sma: Insert the column into the BSMA. Enter yes or no. The default is no.

insertion behavior configuration parameters

thread_count: specify the number of threads to insert data. Default is 8.
create_table_thread_count : The number of threads to build the table, default is 8.
connection_pool_size : The number of pre-established connections to the TDengine server. If not configured, it is the same as number of threads specified.
result_file : The path to the result output file, the default value is . /output.txt.
confirm_parameter_prompt: The switch parameter requires the user to confirm after the prompt to continue. The default value is false.
interlace_rows: Enables interleaved insertion mode and specifies the number of rows of data to be inserted into each child table at a time. Staggered insertion mode means inserting the number of rows specified by this parameter into each sub-table and repeating the process until all sub-tables have been inserted. The default value is 0, i.e., data is inserted into one sub-table before the next sub-table is inserted. This parameter can also be configured in super_tables, and if so, the configuration in super_tables takes precedence and overrides the global setting.
insert_interval : Specify the insert interval in ms for interleaved insert mode. The default value is 0. It only works if -B/--interlace-rows is greater than 0. After inserting interlaced rows for each child table, the data insertion thread will wait for the interval specified by this value before proceeding to the next round of writes. This parameter can also be configured in super_tables, and if so, the configuration in super_tables takes precedence and overrides the global setting.
num_of_records_per_req : Writing the number of rows of records per request to TDengine, the default value is 30000. When it is set too large, the TDengine client driver will return the corresponding error message, so you need to lower the setting of this parameter to meet the writing requirements.
prepare_rand: The number of unique values in the generated random data. A value of 1 means that all data are equal. The default value is 10000.

Query scenario configuration parameters

filetype must be set to query in the query scenario. See [General Configuration Parameters](#General Configuration Parameters) for details of this parameter and other general parameters

Configuration parameters for executing the specified query statement

The configuration parameters for querying the sub-tables or the normal tables are set in specified_table_query.

query_interval : The query interval in seconds, the default value is 0.
threads: The number of threads to execute the query SQL, the default value is 1.
sqls.
- sql: the SQL command to be executed.
- result: the file to save the query result. If it is unspecified, taosBenchmark will not save the result.

Configuration parameters of query super table

The configuration parameters of the super table query are set in super_table_query.

stblname: Specify the name of the super table to be queried, required.
query_interval : The query interval in seconds, the default value is 0.
threads: The number of threads to execute the query SQL, the default value is 1.
sqls:
- sql: The SQL command to be executed. For the query SQL of super table, keep "xxxx" in the SQL command. The program will automatically replace it with all the sub-table names of the super table. Replace it with all the sub-table names in the super table.
- result: The file to save the query result. If not specified, taosBenchmark will not save result.

Introduction​

Installation​

Run​

Configuration and running methods​

Run with the configuration file​

Sample configuration files​

Configuration file examples​

Configuration file parameters in detailed​

General configuration parameters​

Insert scenario configuration parameters​

Stream processing related configuration parameters​

Super table related configuration parameters​

TSMA configuration parameters​

Tag and Data Column Configuration Parameters​

insertion behavior configuration parameters​

Query scenario configuration parameters​

Configuration parameters for executing the specified query statement​

Configuration parameters of query super table​

Support and Feedback